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(57) Abstract 

The invention concerns retroviral strains of the group HIV-1, non-M 
non-O, particularly a strain called YBF30, its fragments and its applications as 
diagnosis reagent and as immunogenic agent. The HIV-2 different both from 
the group M and from the group O have the following characteristics: little or 
no serological response with respect to proteins of groups M and O and strong 
serological response with respect to proteins derived from the YBF30 strain or 
the SIV CPZGAB strain; absence of genomic amplification by the primers of 
regions env and gag of the HIV-1-1 of groups M and O; genomic amplification 
in the presence of the primers derived from the YBF30 strain; and homology 
of the envelope gene products higher than 70 % with respect to the YBF30 
strain. 
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Souches de retrovirus du groupe VIH-1, non-M non-O, notamment 
une souche d6nomm6e YBF30, ses fragments ainsi que ses applications, en 
tarn que reactif de diagnostic et en tant qu'agent immunogfcne. Les VIH-1 
distincts a la fois du groupe M et du groupe O pr&entent les caracteristiques 
suivantes: peu ou pas de reactivity sdrologique vis-a-vis des prolines des 
groupes M et O et forte r€activit£ sdrologique vis-a-vis des prot&nes issues 
de la souche YBF30 selon Tinvention ou de la souche SIV CPZGAB; absence 
d 'amplification ggnomique a 1'aide des amorces des regions env et gag des 
VIH-1 des groupes M et O; amplification genomique en presence des amorces issues de la souche YBF30, selon Pinvention; et homologie 
des produits du gfcne d'enveloppe superieure a 70 % vis-a-vis de la souche YBF30. 



(tf OT T 

pa t o a 

pd A T O 

pm T a T 
ph t c o 

pot T C T 
pol A A T 
pa T Q A 
pat A A C 
pot A C A 
pm A O C 
If A A A 
m A T A 
M TCT 
•or A C T 
mm C O A 
mm OAT 
mm T C A 
«r C T A 
mm A T T 
mm COT 
mm T a a 
■rTT.O 
mt A T T 
i*T9T 
» OOT 
m A O A 
V ATA 



A T A T 

O A O O 

A T T T 

0 A 0 a 

T 0 T O 

A T 0 0 

O A 0 A 

C A O A 
COCO 

T 0 O A 

A A O A 

a t a a 

T C O C 

A 0 C A 

C T T A 

T A Q T 

A a O T 

T A A T 

T T O O 

C T A a 

T A O O 

a A 0 A 

T 0 a a 

A A O O 

O 0 T T 

O C A T 

O A O A 

A A A a 



o a a 

A O A 
A C O 
OOT 
OAT 
A O O 
TCT 
T A O 
ATT 
OOO 
O A C 
T O C 

a a t 

OTA 
O T 0 
A O A 
A T C 
OOO 
A O A 
A A C 
OAT 

ore 

C T C 

a a a 

C T A 
0 T T 
OOO 
O A O 



T T C 
T T A 
A O T 
COT 
OOO 
A A T 
O C C 
OOO 

t a c 

O O A 
ATA 
C A C 
A O O 
A O A 
C T 0 
C T O 
O T T 
AAA 
T T O 
O A O 
O A O 
TOT 
T T O 
O T O 
OOO 
O A C 

a o r 
ooo 



T O A O O 

T A 0 T a o 
AOATOOACOA 

A A A O C 

A T A T 0 

C A O A 0 

C A T A C 

A A O A C 

A 0 T O C 

O A A O O 

0 A O A O 

O T A O O 

T O A O O 

O O C T O 

T O A a 0 

T T A O C 

A C A A A O C 

O O C T O 

O T T C C 

T C C A 0 

O A A A T O C 

O O A O C 

T O T O O 

A T A O C 

A A O 

A T A T O 

A C A A O 

0 T T O T C O 



ABSTRACT 



Retroviral strains of the non-M, non-0 HIV-1 
group, in particular a strain designated YBF30, its 
fragments and also its uses as a diagnostic reagent and 
as an immunogenic agent. 

The HIV-1 viruses which differ both from the M 
group and the 0 group exhibit the following 
characteristics : 

little or no serological reactivity with 
regard to the proteins of the M and 0 groups and strong 
serological reactivity with regard to the proteins 
which are derived from the strain YBF30 according to 
the invention or the strain CPZGAB SIV; 

absence of genomic amplification when using 
primers from the env and gag regions of the M and 0 
HIV-1 groups; 

genomic amplification in the presence of 
primers which are derived from the YBF3 0 strain 
according to the invention; and 

homology of the products of the envelope gene 
which is greater than 70% with regard to the YBF30 
strain. 
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NON-M, NON-0 HIV-1 STRAINS, FRAGMENTS AND OSES. 

The present invention relates to retroviral 
strains of the non-M. non-0 HIV-1 group, in particular 
5 a strain designated YBF30, to its fragments and to its 
agent ^ 3 dia9n ° SCiC ™* gent «* as « immunogenic 

The human acquired immunodeficiency viruses 
HIV-1 and HIV-2 are retrolentiviruses, which are 
10 viruses found in a large number of African primates. 
All these viruses appear to have a common ancestor; 
however it is very difficult to prejudge the period at 
which these different viruses became separated from 

15 I T l^" 0 ^ 0ther "hich are more distant. 

15 but which nevertheless belong to the same group, are 
found in other mammals (ungulates and felines) . 

All these viruses are associated with long 
infections; an absence of symptoms is the rule in 
monkeys which are infected naturally. 

While the origin of HIV-2 appears to be clear 
on account of its strong homology with the Sooty 
Mangabey (West Africa, virus, no virus which is closely 
related to HIV-1 has been found in monkeys. The most 
closely related viruses are viruses found in two 
is chimpanzees (CPZGAB SIV, ANT SIV) . 

All the Antiviruses have been found to exhibit 
substantial genetic variability, and the phylogenetic 
study of chese variants _ obca . ned from a iarse 

of different geographic locations, has enabled 8 
> subtypes (clades, of HIV-! to be distinguished, all of 
which are equidistant from each other. The clades are 
only a mathematical representation of the expression of 
the variability; phenetic analysis, which is based on 
the ammo acids rather than on the nucleic acids, gives 
different results (Korber et al., 1994). 

The demonstration of subtypes is in accord with 
a Phylogenetic analysis which does not, to date, have 
any pathophysiological correlation but. instead, a 
ographical correspondence. This is because each 
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subtype is mainly found in a particular geographical 
area. The B subtype is predominant in Europe and the 
United States whereas two subtypes, i.e. E and B, are 
found in Thailand and there is a strong correlation 
between the mode of transmission which, in actual fact, 
corresponds to a particular population and the subtype 
found. All the clades have been found in Africa and 
their distribution across the rest of the world 
reflects a probability of encounter between persons 
indulging in high-risk behaviour. The main clade, which 
is the main one because it is present in substantial 
proportions in Africa, is clade A. A very great degree 
of variability has been found in some African countries 
(G. Myers, 1994; P.m. Sharp et al . , 1994). Several 
subtypes have been characterized in the western central 
African countries such as the Central African Republic 
(Murphy et al . , 1993) and Cameroon (Nkengasong et al 
1994) . 

Finally, patients have been characterized who 
are carriers of viral variants of HIV-1, whose sera 
have posed detection problems for particular kits which 
are sold on the French market and whose confirmatory 
Western blots have been atypical (Loussert-Ajaka et 
al., 1994; Simon et al., 1994; PCT International 
25 Application WO 96/27013). 

Analysis of these variants has confirmed the 
fact that the type 1 HIV viruses should be subdivided 
into two groups, i.e. the M (major) group and an O 
(outlier) group, which includes these isolates, as 
30 Charneau et al . , 1994 had proposed. Analysis of the 
synonymous mutations /non- synonymous mutations ratio 
carried out on the sequences of the known 0 group 
viruses indicates that this new group is also ancient, 
even if no more ancient than the M group (Loussert- 
35 Ajaka et al . , 1995). Its low prevalence to date, i.e. 
8% of patients infected with HIV-1 in Cameroon (Zekeng 
et al., 1994) and 18 cases characterized in France, is 
thought to be due to factors which are purely 

f/K , - W^emiological 
7<0 
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These two groups of HIV-1 form a tree which is 
in the shape of a double star (Figures 9 to 19). Two . 
isolates, i.e. CPZGAB SIV, characterized from a 
chimpanzee from Gabon (Huet et al., 1990) and CPZANT 
SIV, characterized from a chimpanzee in the Antwerp 
Zoo. possess sequences and genetic organizations which 
are very closely related to HIV-1 but which do not fall 
within either of these two groups and form two new 
branches on the phylogenetic tree. 

The demonstration of new variants is important 
for developing sufficiently sensitive and specific 
reagents for detecting HIV infections, that is to say 
reagents which do not lead to false-negative or false- 
positive results, and for developing compositions which 
are protective in regard to subtypes which do not 
belong either to the M group or to the O group. 

Consequently, the invention provides a non-M, non- 
0 strain, as well as sequences derived from this strain, 
which are suitable for detecting non-M and non-0 HIV-1 
variants and which do not lead to false-negative or false- 
positive results being obtained. In order to do this, 

the inventors have, in particular, established an 
algorithm for differentiating between, and confirming, 
group M and gr'oiip O HIV-1 infections, thereby enabling 
them to select non-M. non^O variants. 

The present invention relates to a non-M, non-0 
HIV-1 strain which exhibits the morphological and 
immunological characteristics of the retrovirus which 
was deposited on 2 July 1996 under number I-17S3 
(designated YBF30) in the Collection Nationale de 
Cultures de Microorganismes (National Collection of 
Microorganism Cultures), kept by the Pasteur Institute. 

A non-M, non-0 variant is understood as meaning 
a type 1 HIV which cannot serologically and molecularly 
be recognized as belonging to either of these groups. 

The present invention also relates to the 
conplete nucleotide sequence of the strain as defined 
above (SEQ ID No. 1) as well as to nucleic acid 
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fragments which are at least 10 nucleotides in size and 
which are derived from the said strain. 

Fragments of this type which may be mentioned 

are : 

5 - YBF 30 LTR (SEQ ID No. 2 ) , 

- YBF 30 GAG (SEQ ID No. 3) (gag gene) , 

- YBF 30 POL (SEQ ID No. 5) [pol gene), 

- YBF 30 VIF (SEQ ID No. 7) (vif gene), 

- YBF 30 VPR (SEQ ID No. 9) (vpr gene), 
10 - YBF' 30 VPU (SEQ ID No. 11) (vpu gene) , 

- YBF 30 TAT (SEQ ID No. 13) (tat gene), 

- YBF 30 REV (SEQ ID No. 15) (rev gene) , 

- YBF 30 ENV gpl60 (SEQ ID No, 17) (env gene), 

- YBF 30 NEF (SEQ ID No. 19) (nef gene), 
15 - the- SEQ ID Nos . 21-57 r also designated, 

respectively, YLG, LPBS.l, GAG Y AS1.1, GAG Y AS1, GAG 
6, GAG Y SI, GAG Y Sl.l, GAG Y SI. 2, YRT AS1.3, YRT 
AS1.2, YRT AS1.1, YRT 2, YRT AS1, YRT 2.1, YRT 2,2, YRT 
2.3, YRT 2.4, 4481-1, 4481-2, 4235.1. 4235.2, 423S.3, 
20 4235.4, SK69.6, SK69.5, SK69.4, SK69.3, SK69.2, SK69.1, 
SK68.1, SK68.2, SK68.3, LSI AS1.3, LSI AS1.2. LSI 

AS1.1, LSI Al, YLPA, 

25 

Such sequences can be used in the specific 
identification of a non-M, non-0 Hiv-l, and as 
30 diagnostic reagents, either alone or pooled with other 
reagents, for the differential identification of any 
HIV-1. 

These sequences may, in particular, be employed in 
diagnostic tests which comprise either a direct 
35 hybridization with the viral sequence to be detected or 
an amplification of the said viral sequence, with these 
tests using, as primers or as probes, an 
oligonucleotide which comprises at least 10 nucleotides 
and which is included in any one of the above 
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sequences, in particular one of the abovementioned 
sequences, SEQ ID Nos . 21-57. 

The present invention also relates to HIV-1 
viruses which are characterized in that they differ 
both from the M group and from the 0 group and exhibit 
the following characteristics: 

little or no serological reactivity with 
regard to proteins of the M and 0 groups and strong 
serological reactivity with regard to proteins which 
are derived from the YBF30 strain or the CPZGAB SIV 
strain; 

absence of genomic amplification when using 
primers from the env and gag regions of HIV-1 viruses 
of the M and 0 groups; 

genomic amplification in the presence of 
primers which are derived from the YBF30 strain, as 
defined above; and 

* homology of the products of the envelope gene 
which is > 70% with regard to the YBF30 strain. 

The invention also relates to the use of the 
above described sequences for implementing a method of 
hybridization and/or of gene amplification of nucleic 
acid sequences, of the HIV-1 type, with these methods 
bexng applicable to the in-vitro diagnosis of the 
potential infection of an individual with a virus of 
the non-M, non-0 HIV-1 type. 

This in-vitro diagnostic method is carried out 
using a biological sample (serum or circulating 
lymphocyte) and comprises: 

. a step of extracting the nucleic acid which 
is to be detected and which belongs to the genome of 
the virus, which virus may possibly be present in the 
biological sample, and, where appropriate, a step of 
treating the nucleic acid using a reverse 
transcriptase, if this nucleic acid is in RNA form, 

• at least one cycle comprising the steps of 
denaturing the nucleic acid, of hybridizing with at 
^Least one sequence in accordance with the invention 
where appropriate, extending the hybrid, which has 

) 



< 
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been formed, in the presence of suitable reagents 
(polymerizing agent, such as DNA polymerase and dNTP) , 
and 

. a step of detecting the possible presence of 
the nucleic acid belonging to the genome of a virus of 
the non-M, non-0 HIV-1 group type. 

The following conditions are employed for the 
PCR using the primers derived from the YBF30 strain: 

- extracting the lymphocytic DNA by means of 
the phenol /chloroform technique and quantifying it by 
spectrophotometry at a wavelength of 260 nm. All the 
amplifications are carried out using a Perkin Elmer 
2400 thermocycler . 

- the long (9 kb) PCRs are carried out using an 
XL PCR kit (Perkin Elmer) in accordance with the 
manufacturer's conditions and using the dNTP's, the 
buffers provided and Perkin Elmer's "hot start"; the 
amplification cycles of this long PCR are: 

. 1 cycle of denaturation for 2 minutes at 

94°C, 

then 16 cycles: 15 seconds at 94°C, 15 
seconds at 55°C, 8 minutes at 68°C, 

then 24 cycles: 15 seconds at 94°C / 15 
seconds at 55°C, 8 minutes at 68°C / adding a further 15 
seconds (incrementation) to each cycle. 

- the nested PCRs are carried out on the 
amplification products of the long PCRs. The conditions 
for carrying out the nested PCRs are as follows: 

m Expand High Fidelity PCR System" Taq 
polymerase buffer and enzyme from Boehringer Mannheim 
in accordance with the manufacturer's instructions, 
dNTP and "hot start" from Perkin Elmer, 

. 200 jomol of each dNTP, 20 pmol of each primer 
in accordance with the invention, 5 jil of DNA, 10 |il of 
10 x PCR buffer and 2.6 units of Taq polymerase in a 
volume of 100 

. amplification: one cycle of 2 minutes at 94°C 
followed by 38 cycles: 15 seconds at 94°C, 15 seconds 
at 55 °C, a time of elongation at 72 °C which varies in 



accordance with the size of the PCR product to be 
amplified (from 30 seconds to 2 minutes) and a final 
elongation cycle of 10 minutes at 72°C. 

The amplified product is preferably detected by 
5 direct sequencing. 

The invention also relates to a peptide or a 
peptide fragment which is characterized in that it can 
be expressed by a non-M, non-0 HIV-1 strain or using a 
nucleotide sequence as defined above, and in that it is 
10 capable: (1) of being recognized by antibodies which 
are induced by a non-M, non-0 HIV-1 virus, as defined 
above, in particular the YBF30 strain or a variant of 
this strain, and which are present in a biological 
sample which is obtained following an infection with a 
15 non-M, non-0 HIV-1 strain, and/or (2) of inducing the 
production of anti-non-M, non-0 HIV-1 antibodies . 

Peptides of this type which may be mentioned 
are, in particular, those which are derived from the 
YBF30 strain, in particular: that which is expressed by 
20 the gag gene (SEQ ID No. 4) , that which is expressed by 
the pol gene (SEQ ID No. 6), that which is expressed by 
the vif gene (SEQ ID No. 8), that which is expressed by 
the vpr gene (SEQ ID No. 10), that which is expressed 
by the vpu gene (SEQ ID No. 12), that which is 
25 expressed by the tat gene (SEQ ID No. 14), that which 
is expressed by the rev gene (SEQ ID No. 16), that 
which is expressed by the env gene (SEQ ID No. 18), or 
one of its fragments such as a fragment of the V3 loop 
region , i.e. CTRPGNNTGGQVQ IG PAMTF YNI EK I VGD I RQ AYC ( SEQ 
30 ID No. 58), and that which is expressed by the nef gene 
(SEQ ID No. 20), or a fragment of these peptides which 
are capable of recognizing the antibodies which are 
produced during an infection with a non-M, non-0 HIV-1 
as defined above. 
35 The invention also relates to immunogenic 

compositions which comprise one or more translation 
products of the nucleotide sequences according to the 
invention and/or one of the peptides as defined above, 
^Y^obtained, in particular, by synthetic means. 
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The invention also relates to the antibodies 
which are directed against one or more of the above- 
described peptides and to their use for implementing 
methods for the in-vitro, in particular differential, 
5 diagnosis of the infection of an individual with a 
virus of the HIV-1 type using methods which are known 
to the skilled person. 

The present invention encompasses all the 
peptides which are capable of being recognized by 
10 antibodies which are isolated from an infectious serum 
which is obtained after an infection with a non-M, 
non-0 HIV-1 strain, and the peptides which are capable 
of being recognized by an antibody according to the 
invention. 

15 The invention furthermore relates to a method 

for the in-vitro diagnosis of a non-M, non-0 HIV-1 
virus, which method is characterized in that it 
comprises bringing a biological sample, which has been 
taken from a patient, into contact with antibodies 

20 according to Claim 10, which may possibly be combined 
with anti-CPZGAB SIV antibodies, and detecting the 
immunological complexes which are formed between the 
HIV-1 antigens, which may possibly be present in the 
biological sample, and the said antibodies. 

25 The invention also relates to a kit for 

diagnosing HIV-1, which kit is characterized in that it 
includes at least one reagent according to the 
invention. 

Apart from the provisions which have been 
30 described above, the invention also comprises other 
provisions which will be evident from the description 
which follows and which refers to examples of 
implementing the method which is the subject of the 
present invention and also to the attached drawings, in 
35 which: 

- Figures 1 to 7 illustrate the location of the 
different primers on the genome of the YBF30 strain; 




- Figure 8 illustrates the genomic organization 
of the YBF30 strain; 
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- Figures 9 to 16 depict the phylogenetic 
analysis of the different genes of the YBF30 strain as 
compared with group M HIV-1 and group 0 HIV-1 (Figure 9: 
ltr gene, Figure 10: gag gene, Figure 11: tat gene, Figure 
5 12: rev gene, Figure 13: vif gene, Figure 14: env gpl20 

gene, Figure 15 env gp41 gene, Figure 16: nef gene, Figure 
17: pol gene, Figure 18: vpr gene, Figure 19: vpu gene); 
- Figure 20 illustrates the percentage genetic distance 
between YBF30 and HIV-1 /CPZGAB SIV. 

10 It should of course be understood, however, that 

these examples are give solely by way of illustrating the 
subject-matter of the invention, of which they in no way 
constitute a limitation. 

For the purposes of this specification it will 

15 be clearly understood that the word * comprising" means 
"including but not limited to", and that the word 
"comprises" has a corresponding meaning. 

All references, including any patents or patent 
applications, cited in this specification are hereby 

20 incorporated by reference. No admission is made that any 
reference constitutes prior art. The discussion of the 
references states what their authors assert, and the 
applicants reserve the right to challenge the accuracy and 
pertinency of the cited documents. It will be clearly 

25 understood that, although a number of prior art 

publications are referred to herein, this reference does 
not constitute an admission that any of these documents 
forms part of the common general knowledge in the art, in 
Australia or in any other country. 

30 

EXAMPLE : Obtaining a non-M, non-0 HIV-1 variant according 
to the invention (YBF30) and its uses. 

This was, in particular, possible in connection 
with studying the epidemiology of infection with human 
35 acquired immunodeficiency viruses (HIV) in Cameroon, which 
epidemiology is especially paradoxical. In this country, 



- 9a - 



• • • 

• ft 



• ft • 



• • • 

• • • 



• • • 



• • • • 

• • • • 



• • 



• ft • 

• • • 
ft ft 



the diversity of the strains is remarkable as most of the 
subtypes of the M (major) group of HIV-1 viruses known to 
date have been reported. Cases of infection with highly 
divergent HIV-1 viruses of the 0 group (0 for outlier) 
5 have been reported, almost exclusively in patients of 

Cameroonian origin. Cases of infection with HIV-2, HTLV-1 
and HTLV-2 subtypes A and B have also been reported. 



serological and genotypic assessments, the inventors 
10 established an algorithm for differentiating between and 
confirming infections with HIV-1 viruses of the M and 0 



went to the National Reference Laboratory for HIV 
15 infections at Yaounde and made it possible to characterize 
a highly divergent HIV isolate and to define the tools for 
characterizing a new HIV-1 group. 



Taking a basis the results of previous 



groups in order to select non-M, non-0 variants. 

These methods were applied to samples which were 
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taking into account the homologies which were observed 
between this human strain YBF30 and the simian strain 
CPZGAB SIV. 

I- Way of serologically characterizing the YBF30 



All the adult patient sera which were sent to 
the Yaounde reference laboratory in 1994 and 1995 for 
detecting or confirming an HIV infection were studied 
10 (n = 8831) . 



antibodies (G6n£lavia Mixt indirect mixed HIV-1 and 
15 HIV-2 EIA, Sanof i-Pasteur, Paris, France), this was 
then combined with an EIA test based on the principle 
of competition with a specific antigen of the M group 
(Wellcozyme Rec HIV-1, Murex, Dartford, UK) . 



20 positive, with a ratio for the reactivity in optical 
density (OD) as compared with the threshold or cut-off 
(CO) value which is greater than 5 (CO/OD > 5), the 
serum is regarded as being HIV-l-positive, a result 
which should be confirmed on a new sample, 
25 The choice of a reactivity ratio which is 

greater than 5 for regarding the competitive test as 
being a test for confirming infection with HIV-1 is 
based on experience acquired by the virology laboratory 
of Bichat hospital: all of 7200 samples which reacted 
30 with a ratio > 5 gave a strongly positive HIV-1 Western 
blot (WB, New Lav Blot 1, SDP, Marnes la Coquette). 
Apart from cases of HIV-1 seroconversion, the samples 
which are confirmed as being HIV-positive and which 
give a Wellcozyme ratio of < 5 correspond either to 
35 infections with HIV-2 or to infections with 0 group 
HIV-1 or other HIV-1 variants. 

In order to eliminate the false-positive 
reactions when carrying out a mixed EIA detection, the 
samples which give a CO/OD ratio of < 5 are tested 



5 



variant during the epidemiological study. 
1) Collecting the samples : 



2) Differentiating serologically between group 
M and group 0 HIV-1, and selecting variants : 

If there was positive detection of anti-HIV 



If the competitive Wellcozyme Rec HIV-1 test is 
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systematically with a third generation mixed HIV-1/HIV- 
2 EIA (Enzygnost Plus, Marburg, Germany) which includes 
antigens of the M and 0 HIV-1 groups (recombinant gp41 
of the MVP5180 strain) . If this test is positive, a 
rapid test which discriminates between HIV-1 and HIV-2 
(Multispot, SDP, Marnes la Coquette) and a Western blot 
(WB, New Lav Blot 1 or 2, SDP) are then carried out. 

3 ) Serologically confirming infections with 0 
group HIV-1 and HIV-1 variants . 

All the samples which give a CO/OD ratio of 
< 5, and which, have been differentiated as being 
positive by WB (positivity criteria: 2 ENV +/- POL + /- 
GAG or 1 ENV + POL + /- GAG) and HIV-1, are tested with 
a dot blot test using peptide antigens of the V3 and 
transmembrane regions (InnoLia, Innogenetics , Ghent, 
Belgium) . 

4) Retroviral isolation of the group 0 and 
variant strains . 

The peripheral blood mononuclear cells (PBMC) 
from the seropositive patients were isolated by Ficoll- 
Hypaque gradient in Cameroon and then stored, and 
transported to Paris, in liquid nitrogen. 

After thawing, the PBMCs from the patients were 
cocultured together with lymphocytes from seronegative 
Caucasian donors. Viral replication in the culture 
supernatants was demonstrated by detecting reverse 
transcriptase activity and by carrying out tests for 
detecting the p24 antigen (Elavia p24 polyclonal, SDP) 
over a period of one month. 

5) Sequences : 

The PCR products are visualized on agarose gels 
of from 1 to 1.4% concentration, depending on the size 
of the fragments, precipitated in 3M sodium acetate 
(1:10) and 3 volumes of absolute ethanol, incubated at 
-80°C for 30 minutes and then centrifuged at 13,000 rpm 
for 20 minutes. The pellet is dried and then taken up 
in 10 Kil of distilled water (Sigma). Purification is 
carried out on a "Qiaquick Gel Extraction kit" (Qiagen) 
in accordance with the manufacturer's instructions; the 
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10 



15 



20 



25 



30 



products are sequenced on an automated DNA sequencer 
(Applied Biosystems, Inc., Foster City, CA) using an 
Applied Biosystem Dye Terminator kit, as previously 
described (Loussert-Ajaka et al., 1995); the nucleotide 
sequences are analysed on Sequence Navigator software 
(Applied Biosystems) , and aligned using GeneWorks 
software (Intelligenetics Inc.). 



software for multiple alignments and taking, as the 
reference matrix, the alignments of the compilation of 
HIV sequences possessed by the Laboratory of Biology 
and Theoretical Biophysics, Los Alamos, New Mexico, 
87545 USA. 

The phylogenetic analyses were performed using 
the PHYLIP software; the distances were firstly 
calculated using DNADIST, after which the phylogenetic 
analysis was carried out using NEIGBOR JOINING or 
FITCH; finally, the trees were drawn using DRAWTREE 
(Figures 9 to 19) . The genetic distance percentages are 
also shown in Figure 20. 

SEQBOOT was first of all used for the 
"bootstrapping" analyses, followed by DNADIST and 
NEIGHBOR JOINING or FITCH. Finally, the bootstrap 
values were obtained using CONSENS . 

II - Results of the investigation for detecting group 0 
and variant HIV viruses : 

174 samples, out of 3193 samples found to be 
positive in the screening, were regarded as being 
group 0 or group M with abnormal serological reactivity 
or as being variants . 

III - Detection of a non-group 0 and non-group M sample 
exhibiting abnormal serological reactivity 

The 174 sera which were HIV-l-positive by WB 
(Western blot), but reactive with a CO/OD ratio of < 5 
in the competitive EIA, were tested by differential LIA 
dot blot on the V3 peptides from group M, group 0 and 
CPZGAB SIV: 



6) Phylogenetic analyses : 

The sequences were aligned using the CLUSTAL 
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- 7 do not react with any of the peptides 
represented (M, 0 or CPZGAB SIV) . The absence of any 
cell collection does not allow any conclusion to be 
drawn. 

5 -82 give a reactivity with regard to at least 

one of the peptides corresponding to the V3 loop of 0 
group strains . The frequency of the crossreactions is 
low and restricted to the epitopes which correspond to 
the consensus V3 regions (11%) and to the CPZGAB SIV V3 
10 regions (43%) . 

84 sera do not react with the 0 group 
epitopes. Most of these samples were obtained from 
patients exhibiting an AIDS syndrome (75/84). 

- one serum, which was taken from a Cameroonian 
15 patient (NJ) reacts exclusively with the CPZGAB SIV 

peptide. This isolated reactivity with regard to a 
CPZGAB SIV antigen has never been described previously. 
Since lymphocytes had been collected from the patient, 
it was possible to continue with the virological 
20 characterization of this strain, which was termed 
YBF30. 

IV - Results of the serological and virological 
examinations performed on the first samples taken from 
this patient (May 1995) (serum No.: 95-6295): 
25 1) Commercial EL ISA tests (optical 

density/ threshold value) 

Criterion of positivity: OD/CO > 1 

G6nelavia = > 15 

Wellcozyme CO/OD = 1.55 
30 Abbott Plus = > 15 

Behring Plus = 4.2 

2) Western blot 

New Lav 1 Pasteur WB: 

160++, 120++, 68++, 55+, 41+, 40+/-, 34++, 
35 24++, 18+ 

3 ) Innogenetics LIA dot blot 

Negative for all the group 0 and group M bands 
apart from CPZGAB SIV V3 
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4) Results of the investigative serological 
examinations carried out on peptides which are specific 
for the M and 0 groups 

The technique developed by Professor Francis 
Bar in of the Virology Laboratory of the Tours CHU was 
modified (Barin F. et al., 1996); use was made of 
synthesized transmembrane region peptides (BioM6rieux) 
for developing a test for differentiating between the M 
and 0 groups. This technique is based on antibody- 
binding competition between ... the transmembrane gp41 
peptides of the 0 and M groups, which are deposited on 
the solid phase, and gp41 transmembrane peptides either 
of the 0 group or of the M group at higher 
concentration in a hyperosmolar liquid reaction phase. 
The results are shown in Table I below, in which the CP 
well corresponds to the 100% inhibition control and the 
CSP well corresponds- to the 0% inhibition control. 



Table X 

Results of the inter-group O-group M differentiations 

for the 6295 serum 





gp41 M 


gp41 0 


CP 


CSP 


6295 


0.25 


0.36 


0.12 


1.98 



These results demonstrate that there is strong 
binding with regard to the peptides of the solid phase 
(CSP) and a marked inhibition due to the combined 
addition of the M and 0 peptides (CP) , but no clear 
differentiation either by the M peptide or by the O 
peptide. This is, therefore, serological evidence that 
the infecting strain does not belong either to the M 
group or to the 0 group. 

In view of an isolated reactivity in the 
InnoLia dot blot with regard to the CPZGAB SIV V3 
antigens, on the same bases of competition between 
peptides, this serum was studied by bringing into 
competition the gp41 M, gp41 0 and gp41 CPZGAB SIV 
peptides . 

Use of the serum from the chimpanzee named 
% Amandine' (donated by M. Peeters, who isolated the 
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CPZGAB SIV strain, AIDS 1992) initially enabled this 
technique to be validated. In Table II, the lowest 
values (OD) indicate the highest degree of binding to 
the antigens. 

Table IX 

Results of the inter-group O-group M-CPZGAB SIV 
differentiations using the Amandine chimpanzee serum 
and the 6295 serum 





gp41 M 


gp41 0 


gp4i 

CPZGAB 


CP 


CSP 


Amandine 


0.8 


1.4 


0.3 


0.5 


1.9 


6295 


0.7 


1.1 


0.7 


0.4 


2.1 



The reactivity of the * Amandine" serum confirms 
and validates the test according to the invention and 
shows that, while the serum of the patient reacts 
identically with regard to the M and CPZGAB SIV 
peptides, it does not exhibit a crossreaction with the 
0 peptide. 

These results demonstrate that the group M gp41 
and CPZGAB SIV gp41 peptides exert a similar inhibition 
on the serum of the patient. The antigens of the 
infecting strain have therefore given rise to 
antibodies which recognize the group M and CPZGAB SIV 
gp41 peptides in a similar manner. 

4) Results obtained from the lymphocyte 
isolation (sampling of May 1995) 

A retrovirus was isolated, using standard 
techniques, from the lymphocytes which were sampled on 
22 May 1995. Culture using the MT2 cell line shows that 
the YBF30 strain does not form any syncytia (NSI) . 
v - Results of the serological examinations carried out 
on the second blood sample (November 1995) (serum No. 
95-3371) 

1) Innogenetics LIA dot blot 

Negative for all the bands, apart from CPZGAB 

SIV V3 



)) 
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2) Results of the investigative serological 
examinations carried out on the peptides specific for 
the M and 0 groups. 

Table III shows the results of the inter-group 
O-group M-CPZGAB SIV gp41 differentiations using the 
3371 serum. 



Table III 

Results of the inter-group O-group M-CPZGAB SIV gp41 
differentiations using the 3371 serum 





gp41 M 


gp41 0 


gp41 CPZGAB 


CP 


CSP 


3371 


1.31 


1.7 


0.89 


0.54 


2.02 



These results confirm, on this » new blood sample 
(taken from the same patient in the terminal stage of 
the disease) , that the CPZGAB SIV gp41 peptide markedly 
inhibits the serum of the patient. 

The antigens of the infecting strain have 
therefore induced antibodies which preferentially 
recognize the CPZGAB SIV gp41 peptide. 

3) Results from the lymphocyte isolation (blood 
sampling of November 95 (95-3371-YBF31) ) 

. A retrovirus was isolated, using the standard 
techniques, from the lymphocytes which were sampled in 
November 1995 and termed YBF31; the sequence elements 
are identical to those of YBF3 0. 

VI - Genomic amplification and sequences of YBF30 

The DNA for all the PGR manipulations is 

extracted from the cells obtained at the end of a 

positive culture. 

The PCRs carried out using the 0 group HIV-1 

primers are negative in the different regions tested 

(9*9, pol, env) . Similarly, those carried out using the 

primers which are specific for M group HIV-1 are also 

negative. 

The amplification and hybridization conditions 
for the 0 group PCRs are those described in Loussert- 
Ajaka, 1995. The amplification and hybridization 
conditions for the M group PCRs are those described by 
the authors cited below. 
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These M group primers are located in accordance 
with the HIV-1 HXB2 sequence as follows: 

- in env gpl20: ED3/ED12 (position 5956-5985; 
7822-7792) ; ED5/ED14 (6556-6581; 7960-7931) ; ED5/ED12; 
ED3/ED14; ES7/ES8 (7001-7020; 7667-7647) (Delwart et 
al. Science 1993; 262: 1257-1261). 

- in env gp41: first PCR, ED3/M29, followed by 
a nested PCR, M28/M29 (7785-7808; 8099-8124); M28/M29 
have the following sequences: 

M2 8 : CGGTTCTT ( AG ) GGAGCAGC ( ACT ) GGAAGCA , 

M29: T(CT)T(ACGT)TCCCA(CT)T(AT) ( CT ) A ( AGT ) CCA ( AGT ) GTCAT ; 

SK68/SK69 (Ou' et al . Science, 1988; 239: 295- 

297). 

. - in gag: Amplicor Roche Diagnostics systems; 
nested gag primers (Loussert-Ajaka et al. Lancet 1995; 
346: 912-913); SK38/SK39 (Ou et al., Science, 1988; 
239: 295-297) . 

- in pol: A/NE1 (Boucher et al . , Lancet, 1990; 
336: 585-590); Pol3/Pol4 (Laure et al., Lancet, 1988, 
ii, 538-541) . 

Only the PCRs carried out using the H Pol 
primers (4235/4538) are positive, with this being 
followed by a nested PCR using the primers 4327/4481 
(Fransen et al., Molecular and Cellular Probes 1994; 8: 
317-322). This H Pol fragment, which is located in the 
integrase (260 bp), has been sequenced. Amplification 
using the HPOL primers is made possible due to the 
excess of virus. This is because the DNA which is used 
is extracted from cells at the end of a strongly 
positive culture (reverse transcriptase > 100,000 cpm) . 
It is not possible to amplify the DNA which is 
extracted from fresh cells without coculture because of 
the large number of mispairings between the HPOL 
primers (especially in the 3' region) and the sequence 
of the YBF30 isolate. Conservation of this 3' end is 
very important for the extension activity of the Taq 
polymerase. 

1 - Sequence of the pol gene: the use of very 
degenerate primers for amplifying, by RT-PCR, the RNA 



extracted from the positive culture supernatant gave a 
positive amplification. These are primers which are 
common to all retroviruses (Donehower et al. J. Virol. 
Methods 1990; 28: 33-46), and are located in the 
5 reverse transcriptase region of the pol gene. Analysis 
of the fragment after sequencing made it possible to 
generate a specific primer, i.e. YRT2 (SEQ ID No. 32), 
from the YBF30 isolate and to amplify the pol gene 
r using the Hpol 4481 primer (Fransen et al . , 1994, loc. 
10 cit.) as the antisense primer. The fragment was 
sequenced by synthesizing specific primers as required 
for each fragment generated (Figure 1). 

2 - Sequence of the env gene: the second 
approach was to perform a long PCR (XL-PCR, Perkin 

15 Elmer), thereby amplifying all the virus (9000 bp) 
using primers situated in the LTR: LPBS 1 (SEQ ID 
No. 22); LSiGi, followed by a 6000 bp nested PCR using 
YRT2 (SEQ ID No.32)/SK69, and to sequence all the 
envelope following the same procedure. The gp41 region 

20 was sequenced using a nested PCR and employing the 
primers SK68/LSiGi. 

3 - Sequence of the grag gene: use of a nested 
PCR, achieved by means of a long PCR (LPBS 1/LSiGi) , 
employing the primers Gag 5 and Gag Hi, and generating 

25 from this specific primers, as required, in order to 
walk along the viral genome. 

VII - Results of the sequencings 

The strain YBF3 0 was sequenced completely (see 
list of sequences) . The YBF31 strain of November 1995 
30 was sequenced in part, and the absence of significant 
variation confirms the validity of the YBF30 sequences. 

VIII - Synthesizing peptides of the V3 loop region of 
the YBF3 0 strain . 

Studying the sequences of the V3 loop region 
35 made it possible to synthesize the corresponding 
peptide and to compare the amino acids of this region 
of the YBF30 strain with those of other M subtypes and 
0 strains. 




The sequences of the peptides are: 



YBF30: SEQ ID No. 58 

CPZGAB SIV: CHRPGNNTRGEVQIGPGMTFYNIENVYGDTRSAYC 

(SEQ ID No. 59) 
GROUP O: C IRPGNRTYRNLQ IGPGMTF YNVE I ATGD I RKAFC 

5 (ANT70) (SEQ ID No. 60) 

GROUP M: CTRPNNNTRKSVRIGPGQAFYATGDI IGDIRQAHC 

(SS-TYPE A) (SEQ ID No. 61) 

The peptide was synthesized, starting with the 
two asparagines of the 5' region of the loop, and used 

10 in accordance with the same principle as previously 
described (see- IV 4)), namely in competition in 
relation to the peptides of the M group, the 0 group 
and CPZGAB SIV. The results shown in Table IV confirm 
the original nature of this strain and the possible 

15 spread of these strains, since the serological results 
favour infection of the YBF30 type in Cameroon. 
Furthermore, a study of 200 selected HIV-l-positive 
sera from Cameroon provides evidence of a new case 
exhibiting a profile which is similar to that of YBF30. 

2 0 Table IV 

Study of the reactivity of 200 sera 



Serum 


Origin 


V3A 


V3cpz 


V3YBF30 


CP 


CSP 


953371 


Cameroon 


1.66 


0.38 


1.39 


0.39 


1.64 


956295 


Cameroon 


1.72 


0.37 


1.16 


0.51 


1.73 


967321 


Cameroon 


0.07 


0.17 


0.5 


0.05 


0.27 


Amandine 


GABS IV 


1.74 


0.14 


1.48 


0.19 


1.74 


NOA. * 


ANTS IV 


2.66 


0.31 


1.88 


0.46 


1.9 



serum from CPZ ANT SIV 



The reactivity of the sera 953371 and 956295, 
25 corresponding to the patient from whom the YBF30 strain 
was isolated, with the CPZ SIV peptide, was confirmed 
in this new test. The lower reactivity with regard to 
its own V3 antigen is usual during the late stages of 
the disease. Nevertheless, this reactivity remains 
3 0 greater than that raised with regard to the M peptide. 
Another Cameroonian patient {serum 967321) exhibits the 
same profile of peptide reactivity. 
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As is evident from the above, the invention is 
in no way limited to those of its embodiments which 
have just been described more explicitly; on the 
contrary, it encompasses all the variants which may 

30 come to the mind of the skilled person without 
departing from the context or scope of the present 
invention. 
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SEQUENCE LISTING 



GENERAL INFORMATION: 

(i) APPLICANT : 

(A) NAME: INSTITUT NATIONAL DE LA SANTE ET DE LA 

RECHERCHE MEDICALE - INSERM 

(B) STREET: 101 rue DE TOLBIAC 

(C) TOWN: PARIS 

(E) COUNTRY: FRANCE 

(F) ZIP CODE : 75654 CEDEX 13 

NAME: ASSISTANCE PUBLIQUE-HOPITAUX DE PARIS 

(B) STREET: 3 avenue Victoria 

(C) TOWN: PARIS 

(E) COUNTRY: FRANCE 

(F) ZIP CODE: 75100 RP 

(A) NAME: INSTITUT PASTEUR 

(B) STREET: 28 rue du Docteur Roux 

(C) TOWN: PARIS 

(E) COUNTRY: FRANCE 

(F) ZIP CODE: 75724 C<§dex 15 

(A) NAME: MAUCLERE Philippe 

(B) STREET: 2 rue Buhan 

(C) TOWN: BORDEAUX 

(E) COUNTRY: FRANCE 

(F) ZIP CODE: 33000 

(A) NAME: LOUSSERT-AJAKA Ibtissam 

(B) STREET: 26 avenue de la Republique 

(C) TOWN: SARTROUVILLE 

(E) COUNTRY: FRANCE 

(F) ZIP CODE: 78500 

(A) NAME: SIMON Francois 

(B) STREET: 8 rue Germain Pilon 

(C) TOWN: PARIS 

(E) COUNTRY: FRANCE 

(F) ZIP CODE: 75018 

(A) NAME: SARAGOSTI Sentob 

(B) STREET: 69 bis rue de Billancourt 

(C) TOWN: BOULOGNE BILLANCOURT 

(E) COUNTRY: FRANCE 

(F) ZIP CODE: 92100 

(A) NAME: BARRE-SINOUSSI Franqroise 

(B) STREET: 104 Le Capricorne, 50 rue d'Erevan 

(C) TOWN: ISSY LES MOULINEAUX 

(E) COUNTRY: FRANCE 

(F) ZIP CODE: 92130 

(ii) TITLE OF INVENTION : NON-M NON-O, HIV STRAINS, 
FRAGMENTS AND USE. 



(iii) NUMBER OF SEQUENCES : 61 
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(iv) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM : PC-DOS/MS -DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.30 (OEB) 



(2) INFORMATION FOR SEQ ID NO: 1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 9183 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : simple 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: ADN (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 



CTTCTCGCTT 


GTACTGGGTC 


TCTCTTGCTG 


GACCAGATTA GAGCCTGGGA GCTCTCTGGC 


60 


TAGCAGGGAA 


CCCACTGCTT 


AAGCCTCAAT 


AAAGCTTGCC 


TTGAGTGCTA AAGTGGTGTG 


120 


TGCCCATCCA 


TTCGGTAACT 


CTGGTACCTA 


GAGATCCCTC 


AGACCATCTA GACTGAGTGA 


180 


AAAATCTCTA 


GCAGTGGCGC 


CCGAACAGGG 


ACTTGAAAAC 


GAAAGTAGAA CCGGAGGCTG 


240 


AATCTCTCGA 


CGCAGGACTC 


GGCTCGTTGG 


TGCACACAGC 


GAGAGGCGAG GCGGCGGAAG 


300 


TGTGAGTACG 


CAATTTTGAC 


TGGCGGTGGC 


CAGAAAGTAG 


GAGAGAGGAT GGGTGCGAGA 


360 


GCGTCAGTGT 


TAACAGGGGG 


AAAATTAGAT 


CAATGGGAAT 


CAATTTATTT GAGACCAGGG 


420 


GGAAAGAAAA 


AATACAGAAT 


GAAACATTTA 


GTATGGGCAA 


GCAGGGAGCT GGAAAGATTC 


480 


GCTTGTAACC 


CAGGTCTCAT 


GGACACAGCG 


GACGGCTGTG 


CCAAGTTACT AAATCAATTA 


540 


GAACCAGCTC 


TCAAGACAGG 


GTCAGAAGAA 


CTGCGCTCTT 


TATATAACGC TCTAGCAGTT 


600 


CTTTATTGTG 


TCCATAGTAG 


GATACAGATA 


CACAACACAC 


AGGAAGCTTT GGACAAGATA 


660 


AAAGAGAAAC 


AGGAACAGCA 


CAAGCCCGAG 


CCAAAAAACC 


CAGAAGCAGG GGCAGCGGCA 


720 


GCAACTGATA 


GCAATATCAG 


TAGGAATTAT 


CCTCTAGTCC 


AGACTGCTCA AGGACAAATG 


780 


GTACATCAGC 


CGCTGACACC 


CAGAACCTTA 


AATGCTTGGG 


TGAAAGTGAT AGAGGAGAAG 


840 


GCCTTTAGTC 


CAGAAGTAAT 


ACCAATGTTT 


ATGGCCTTGT 


CAGAAGGGGC AACGCCCTCA 


900 


GATCTAAATA 


CTATGTTAAA 


TACAGTAGGG 


GGACATCAGG 


CAGCAATGCA GATGCTGAAG 


960 


GAAGTCATCA ATGAGGAAGC 


AGCAGACTGG 


GATAGGACAC 


ATCCAGTCCC TGTGGGACCA 


1020 


CTACCCCCAG 


GGCAACTGAG 


AGACCCTAGA 


GGAAGTGATA 


TAGCAGGAAC AACTAGCACC 


1080 
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CTGGCAGAAC AGGTGGCTTG GATGACTGCT AATCCTCCTG TTCCAGTAGG AGATATTTAT 1140 
AGAAGATGGA TAGTCCTGGG GTTAAACAGA ATTGTGAGAA TGTATAGTCC TGTCAGCATT 1200 
CTAGAGATCA AACAAGGACC AAAAGAACCC TTCAGAGACT ATGTAGACAG GTTCTACAAA 1260 
ACTCTAAGAG CAGAGCAGGC AACACAGGAA GTAAAGAATT GGATGACAGA AACACTCTTA 1320 
GTACAAAATG CAAACCCAGA TTGTAAACAG CTCCTAAAAG CATTAGGGCC AGGAGCTACC 1380 
TTAGAAGAGA TGATGACGGC CTGCCAGGGA GTGGGGGGAC CAGCACATAA GGCAAGAGTG 144 0 
CTAGCAGAGG CTATGTCACA GGTGCAGCAG CCAACAACTA GTGTCTTTGC ACAAAGGGGA 1500 
AACTTTAAAG GCATAAGGAA ACCCATTAAA TGTTTCAATT GTGGCAAAGA GGGCCATTTG 1560 
GCAAGAAACT GTAAGGCCCC TAGAAGAGGA GGCTGTTGGA AGTGTGGGCA AGAAGGACAT 1620 
CAAATGAAAG ATTGTAAAAA TGAAGGAAGA CAGGCTAATT TTTTAGGGAA GAGCTGGTCT 1680 
CCCTTCAAAG GGAGACCAGG AAACTTCCCC CAGACAACAA CAAGGAAAGA GCCCACAGCC 174 0 
CCGCCACTAG AGAGTTATGG GTTTCAGGAG GAGAAGAGCA CACAGGGGAA GGAGATGCAG 1800 
GAGAACCAGG AGAGGACAGA GAACTCTCTG TACCCACCTT TAACTTCCCT CAGATCACTC 1860 
TTTGGCAACG ACCCGTCATC ACAGTAAAAA TAGGGAAAGA AGTAAGAGAA GCTCTTTTAG 1920 
ATACAGGAGC TGATGATACA GTAATAGAAG AGCTACAATT AGAGGGAAAA TGGAAACCAA 1980 
AAATGATAGG AGGAATTGGA GGATTTATCA AAGTGAGACA ATATGATAAT ATAACAGTAG 204 0 
ACATACAGGG AAGAAAAGCA GTTGGTACAG TATTAGTAGG ACCAACACCT GTTAATATTA 2100 
TAGGAAGAAA TCTTTTAACC CAGATTGGCT GTACTTTAAA TTTTCCAATA AGTCCTATTG 2160 
AAACTGTACC AGTAAAATTA AAACCAGGAA TGGATGGCCC AAAGGTAAAA CAATGGCCTT 2220 
TGACAACAGA AAAAATAGAG GCATTAAGAG AAATTTGTAC AGAAATGGAA AAGGAAGGAA 2280 
AAATTTCTAG AATAGGGCCT GAGAATCCAT ATAACACTCC AATTTTTGCT ATAAAAAAGA 2340 
AAGATAGCAC TAAATGGAGA AAATTAGTAG ATTTCAGGGA ATTAAATAAA AGGACCCAAG 2400 
ATTTTTGGGA AGTGCAGCTA GGAATTCCAC ATCCAGCAGG ATTAAAGCAG AAAAAATCAG 24 60 
TGACAGTTTT GGATGTAGGA GATGCTTATT TTTCATGTCC CTTGGACAAA GATTTTAGAA 2 52 0 
AGTATACAGC TTTTACCATA CCTAGTATAA ACAATGAGAC ACCTGGTATT AGATACCAGT 2 580 
ATAATGTGCT GCCACAAGGC TGGAAAGGGT CACCAGCAAT TTTTCAGAGT ACAATGACAA 2640 
AAATTCTAGA ACCATTCAGA GAGAAACATC CAGAGATAAT CATTTACCAG TACATGGATG 2700 
^ST/ : t^XcCTCTATGT GGGATCTGAC TTAGAACTAG CACAACATAG AGAGGCAGTA GAAGACCTTA 2760 




GAGATCATCT TTTGAAGTGG GGCTTTACGA 
CGTTCCTCTG GATGGGATAT GAACTCCATC 
TACCAGAAAA GGATGTATGG ACTGTCAATG 
GGGCAAGTCA GATCTATCCA GGAATCAGAG 
CCAAAGCTTT GACAGAAGTA GTCAACTTTA 
ACAGGGAGAT ATTAAAAGAA CCCCTGCATG 
TAGCAGAAAT TCAAAAGCAA GGACAAGGTC 
ATAAAAATTT AAAAACAGGA AAGTATGCAA 
AACAGTTAGT TGAAGTGGTA AGGAAAGTGG 
CTCCTAAATT TAGATTACCA GTACAAAAGG 
GGCAAGCAAC TTGGATTCCT GAGTGGGAAT 
GGTATCAGTT AGAAACAGAG CCAATCAGTG 
CTAATAGGGA AACAAAATTG GGAAAAGCAG 
TGGTCTCTAT TGCAGACACC ACCAATCAAA 
TACAAGAGTC AGGACGGGAT GTAAACATAG 
TTCATTCACA GCCAGATAAA AGTGAATCAG 
TAAAAAAGGA AAGAGTTTAT CTCTCTTGGG 
AGCAGGTAGA CAAATTAGTT AGCTCAGGAA 
AAAAAGCCCA AGAAGATCAT GACAGATATC 
TTAACTTACC CCCCATAGTG GCAAAAGAAA 
AAGGGGAAGC CATGCATGGA CAGGTCAATT 
CACACTTAGA GGGAAAAATC ATCCTTGTGG 
CAGAAGTTAT TCCTGCAGAG ACAGGACAGG 
GAAGATGGCC AGTAAAAGTT ATACACACTG 
TAAAAGCAGC CTGTTGGTGG GCAAATATCA 
AAAGTCAGGG AGCAGTAGAG TCCATGAATA 
GAGATCAAGC AGAACATCTA AAGACAGCAG 
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CCCCTGACAA AAAACATCAG AAGGAACCCC 2 82 0 
CAGACAAATG GACAGTCCAG CCAATAAAGT 2880 
ATATACAGAA ATTAGTAGGA AAGTTAAATT 2 940 
TAAAACAGCT CTGTAAATTA ATCAGAGGAA 3000 
CAGAAGAAGC AGAATTAGAA CTAGCAGAAA 3 060 
GAGTCTATTA TGACCCAGGA AAAGAATTAG 3120 
AGTGGACATA TCAGATTTAT CAGGAGTTAC 3i80 
AAATGAGATC TGCCCATACT AATGATATAA 3240 
CAACAGAAAG TATAGTAATT TGGGGAAAGA 3300 
AAGTGTGGGA GGCATGGTGG ACCGATCATT 33 60 
TTGTCAACAC TCCTCCCCTT GTAAAATTAT 3420 
GGGCAGAAAC TTTCTATGTA GATGGAGCAG 34 80 
GTTTTGTGAC AGATAGGGGA AGACAGAAAG 3540 
AGGCTGAGTT ACAAGCTATC CTTATGGCCT 3600 
TCACTGACTC TCAGTATGCT ATGGGAATAA 3660 
AATTGGTGAG CCAAATAATA GAAGAGCTCA 3720 
TACCTGCACA TAAAGGTATT GGAGGAAATG 37 80 
TTAGAAAAAT ATTATTCCTA GATGGTATAG 384 0 
ACAGCAATTG GAAAGCAATG GCCAGTGATT 3900 
TAGTAGCCAG CTGTGACAAA TGCCAGCTAA 3960 
GTAGTCCAGG AGTGTGGCAA TTAGATTGTA 4020 
CGGTCCATGT GGCCAGTGGC TACTTAGAAG 4080 
AAACAGCATA TTTTATTTTA AAGTTAGCTG 4140 
ATAATGGATC CAATTTCACT AGTGCCACTG 42 00 
AACAGGAATT TGGGATACCC TACAATCCTC 42 60 
AAGAATTAAA GAAAATTATA GGACAAATCA 4320 
TGCAAATGGC GGTTTTCATT CACAATTTTA 43 80 
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AAAGAAAAGG GGGGATTGGG GGGTACACTG CAGGGGAAAG AATAATAGAC ATAATAGCAA 444 0 
CAGACATACA GACAACAAAT TTACAAACAC AAATTTTAAA AGTTCAAAAT TTTCGGGTTT 4500 
ATTACAGAGA CAGCAGAGAT CCCATTTGGA AAGGACCAGC CAAACTTCTG TGGAAAGGAG 4560 
AAGGGGCAGT GGTAATTCAA GATAACGGGG ATATAAAAGT AGTCCCACGT AGGAAAGCAA 4 62 0 
AAATAATTAG GGATTATGGA AAACAGATGG CAGGTGATGG TTGTGTGGCA AGTGGACAGG 4 680 
ATGAAAATCA GGAAATGGAA TAGCTTAGTA AAACATCATA TGTATGTGTC AAAAAAGGCA 474 0 
AAAGGATGGT ATTATAGACA TCATTATGAA ACACATCACC CAAAAATAAG TTCAGAAGTA 4 800 
CATATCCCAG TAGGTCAGGC AAGATTAGTG ACAGTCACTT ATTGGGGGCT AACAACAGGA 4860 
GAACAGTCTT GGCATCTAGG ACATGGAGTA TCCATAGAAT GGAGACTAAG AAAATACAAG 4 92 0 
ACACAAGTTG ATCCTGAAAT GGCAGACAAG CTAATACATC TTCATTATTT TGATTGTTTT 4 980 
ACAGCCTCTG CCATAAGGCA AGCGGTCTTA GGGAGACCAG TATTACCTAG GTGTGAATAT 504 0 
CCAGCAGGGC ACAAACAGGT AGGCACCCTA CAATATCTAG CACTAACAGC CTGGGTGGGA 5100 
GCAAAGAAGA GAAAGCCACC CTTACCTAGT GTGACTAAGC TAACAGAAGA TAGATGGAAC 5160 
GAGCACCAGA AGATGCAGGG CCACAGAGGG AACCCTATAA TGAATGGGCA CTAGAATTAT 5220 
TAGAAGAATT AAAAAATGAA GCTGTGCGCC ATTTTCCAAG GATTTGGCTA CATGGGTTAG 5280 
GACAACACAT CTATAACACA TATGGAGACA CCTGGGAGGG GGTAGAGGCA ATTATCAGGA 534 0 
TACTACAACA ATTACTGTTT ATCCATTATA GGATTGGCTG CCAGCACAGC AGAATAGGGA 5400 
TCACTCCTCA AAGGAGAAGG AATGGAACCA GTAGATCCTA GATTAGAGCC CTGGAATCAT 5460 
CCAGGAAGCC AACCTAAAAC AGCTTGCAAT AATTGCTATT GTAAAAGATG TTGCTATCAC 5520 
TGCTTATATT GCTTCACAAA GAAAGGCTTA GGCATCTCAT ATGGCAGGAA GAAGCGGAGT 5580 
CAACGACGAA GAACTCCTCA GAGCAGTAAG AGTCATCAAG ATCTTATACC AGAGCAGTAA 564 0 
GTAAAACCTG TATATATGCT GTCATTGGGA TTCATAGCGT TAGGAGCAGC AGTTAGCATA 570 0 
GCAGTAATAG TCTGGGCATT ACTATATAGA GAATATAAGA AAATAAAATT GCAGGAAAAA 5760 
ATAAAACACA TAAGACAGAG AATAAGAGAA AGAGAAGAAG ATAGTGGCAA TGAAAGTGAT 5820 
GGGGATGCAG AGTGGTTGGA TGGGGATGAA GAGTGGTTGG TTACTCTTCT ATCTTCTAGT 588 0 
AAGCTTGATC AAGGTAATTG GGTCTGAACA ACATTGGGTA ACAGTGTACT ATGGGGTACC 594 0 
AGTATGGAGA GAAGCAGAGA CAACTCTTTT CTGTGCTTCA GATGCTAAAG CCCATAGTAC 6000 
AGAGGCTCAC AACATCTGGG CCACACAAGC ATGTGTTCCT ACTGATCCCA ATCCACAAGA 6060 
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AGTGCTATTA CCCAATGTAA CTGAAAAATT TAATATGTGG GAAAATAAAA TGGCAGACCA 6120 
AATGCAAGAG GATATTATCA GTCTGTGGGA ACAGAGCTTA AAGCCCTGTG TTAAATTAAC 6180 
CCCATTATGT GTAACTATGC TTTGTAACGA TAGCTATGGG GAGGAAAGGA ACAATACAAA 624 0 

TATGACAACA AGAGAACCAG ACATAGGATA CAAACAAATG AAAAATTGCT CATTCAATGC 6300 
AACCACTGAG CTAACAGATA AAAAGAAGCA AGTTTACTCT CTGTTTTATG TAGAAGATGT 6360 

AGTACCAATC AATGCCTATA ATAAAACATA TAGGCTAATA AATTGTAATA CCACAGCTGT 6420 

GACACAAGCT TGTCCTAAGA CTTCCTTTGA GCCAATTCCA ATACATTACT GTGCACCACC 6480 

AGGCTTTGCC ATTATGAAAT GTAATGAAGG AAACTTTAGT GGAAATGGAA GCTGTACAAA 6540 

TGTGAGTACT GTACAATGCA CACATGGAAT AAAGCCAGTG ATATCCACTC AGTTAATCCT 6600 

AAATGGAAGC TTAAATACAG ATGGAATTGT TATTAGAAAT GATAGTCACA GTAATCTGTT 6660 

GGTGCAATGG AATGAGACAG TGCCAATAAA TTGTACAAGG CCAGGAAATA ATACAGGAGG 6720 

ACAGGTGCAG ATAGGACCTG CTATGACATT TTATAACATA GAAAAAATAG TAGGAGACAT 6780 

TAGACAAGCA TACTGTAATG TCTCTAAAGA ACTATGGGAA CCAATGTGGA ATAGAACAAG 684 0 

AGAGGAAATA AAGAAAATCC TGGGGAAAAA CAACATAACC TTCAGGGCTC GAGAGAGGAA 6900 

TGAAGGAGAC CTAGAAGTGA CACACTTAAT GTTCAATTGT AGAGGAGAGT TTTTCTATTG 6960 

TAACACTTCC AAATTATTTA ATGAGGAATT ACTTAACGAG ACAGGTGAGC CTATTACTCT 7020 

GCCTTGTAGA ATAAGACAGA TTGTAAATTT GTGGACAAGG GTAGGAAAAG GAATTTATGC 7080 

ACCACCAATT CGGGGAGTTC TTAACTGTAC CTCCAATATT ACTGGACTGG TTCTAGAATA 7140 

TAGTGGTGGG CCTGACACCA AGGAAACAAT AGTATATCCC TCAGGAGGAA ACATGGTTAA 7200 

TCTCTGGAGA CAAGAGTTGT ATAAGTACAA AGTAGTTAGC ATAGAACCCA TAGGAGTAGC 7260 

ACCAGGTAAA GCTAAAAGAC GCACAGTGAG TAGAGAAAAA AGAGCAGCCT TTGGACTAGG 732 0 

TGCGCTGTTT CTTGGGTTTC TTGGAGCAGC AGGGAGCACT ATGGGCGCAG CGTCAATAAC 7380 

GCTGACGGTA CAGGCCCGGA C ATT ATT AT C TGGGATAGTG CAACAGCAGA ATATTCTGTT 7440 

GAGAGCAATA GAGGCGCAAC AACATTTGTT GCAACTCTCA ATCTGGGGCA TTAAACAGCT 7500 

CCAGGCAAAA GTCCTTGCTA TAGAAAGATA CCTTAGGGAT CAGCAAATCC TAAGTCTATG 7560 

GGGCTGCTCA GGAAAAACAA TATGCTATAC CACTGTGCCT TGGAATGAGA CTTGGAGCAA 7620 

CAATACCTCT TATGATACAA TCTGGAATAA TTTAACCTGG CAACAATGGG ATGAGAAAGT 7680 
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AAGAAACTAT TCAGGTGTCA TTTTTGGACT TATAGAACAG GCACAAGAAC AACAGAACAC 7740 
AAATGAGAAA TCACTCTTGG AATTGGATCA ATGGGACAGT CTGTGGAGCT GGTTTGGTAT 7800 
TACAAAATGG CTGTGGTATA TAAAAATAGC TATAATGATA GTAGCAGGCA TTGTAGGCAT 7860 
AAGAATCATA AGTATAGTAA TAACTATAAT AGCAAGAGTT AGGCAGGGAT ATTCTCCCCT 7 92 0 
TTCGTTGCAG ACCCTTATCC CAACAGCAAG GGGACCAGAC AGGCCAGAAG AAACAGAAGG 7980 
AGGCGTTGGA GAGCAAGACA GAGGCAGATC CGTGCGATTA GTGAGCGGAT TCTCAGCTCT 804 0 
TGTCTGGGAG GACCTCCGGA ACCTGTTGAT CTTCCTCTAC CACCGCTTGA CAGACTCACT 8100 
CTTGATACTG AGGAGGACTC TGGAACTCCT GGGACAGAGT CTCAGCAGGG GACTGCAACT 8160 
ACTGAATGAA CTCAGAACAC ACTTGTGGGG AATACTTGCA TATTGGGGAA AAGAGTTAAG 8220 
GGATAGTGCT ATCAGCTTGC TTAATACAAC AGCTATTGTA GTAGCAGAAG GAACAGATAG 8280 
GATTATAGAA TTAGCACAAA GAATAGGAAG GGGAATATTA CACATACCTA GAAGAATCAG 8340 
ACAAGGCCTA GAAAGAGCAC TGATATAAGA TGGGAAAGAT TTGGTCAAAG AGCAGCCTAG 84 00 
TAGGATGGCC AGAAATCAGA GAAAGAATGA GAAGACAAAC GCAAGAACCA GCAGTAGAGC 84 60 
CAGCAGTAGG AGCAGGAGCA GCTTCTCAAG ATCTAGCTAA TCGAGGGGCC ATCACCATAA 8520 
GAAATACTAG AGACAATAAT GAAAGTATAG CTTGGCTAGA AGCACAAGAA GAAGAAGAGG 8580 
AAGTAGGCTT TCCAGTACGC CCTCAGGTAC CATTAAGGCC AATAACCTAT AAACAGGCTT 864 0 
TTGATCTTTC CTTCTTTTTA AAAGATAAGG GGGGACTGGA AGGGCTAGTT TGGTCCAGAA 870 0 
AAAGGCAAGA TATTCTAGAC CTCTGGATGT ATCACACACA AGGCATCCTC CCTGACTGGC 8760 
ATAACTACAC ACCAGGGCCA GGAATTAGAT ACCCCGTAAC CTTTGGATGG TGCTTCAAAC 882 0 
TAGTACCATT GTCAGCTGAA GAAGTAGAAG AGGCTAATGA AGGAGACAAC AATGCCCTCT 8880 
TACACCCCAT ATGTCAACAT GGAGCAGATG ATGATCATAA AGAAGTGTTG GTGTGGCGAT 894 0 
TTGACAGCTC CCTAGCAAGA AGACATGTAG CAAGAGAGCT GCATCCGGAG TTTTACAAGA 9000 
ACTGCTGACA AGGGACTTTA CTGCTGACAA GGGACTTTAT ACTTGGGGAC TTTCCGCCAG 9060 
GGACTTTCCA GGGAGGTGTG GTTGGGGGAG TGGCTTGCCC TCAGAGCTGC ATAAAAGCAG 9120 
CCGCTTCTCG CTTGTACTGG GTCTCTCTTG CTGGACCAGA TTAGAGTCTG GGAGCATATT 9180 
GGG 9183 
(2) INFORMATION FOR SEQ ID NO : 2: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGHT: 813 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: simple 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: ADN (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2: 



TTGGAAGGGC 


TAGTTTGGTC CAGAAAAAGG CAAGATATTC TAGACCTCTG GATGTATCAC 


60 


ACACAAGGCA 


TCCTCCCTGA CTGGCATAAC TACACACCAG GGCCAGGAAT TAGATACCCC 


120 


GTAACCTTTG 


GATGGTGCTT CAAACTAGTA CCATTGTCAG CTGAAGAAGT AGAAGAGGCT 


180 


AATGAAGGAG 


ACAACAATGC CCTCTTACAC CCCATATGTC AACATGGAGC AGATGATGAT 


240 


CATAAAGAAG 


TGTTGGTGTG GCGATTTGAC AGCTCCCTAG CAAGAAGACA TGTAGCAAGA 


300 


GAGCTGCATC 


CGGAGTTTTA CAAGAACTGC TGACAAGGGA CTTTACTGCT GACAAGGGAC 


360 


TTTATACTTG 


GGGACTTTCC GCCAGGGACT TTCCAGGGAG GTGTGGTTGG GGGAGTGGCT 


420 


TGCCCTCAGA GCTGCATAAA AGCAGCCGCT TCTCGCTTGT ACTGGGTCTC TCTTGCTGGA 


480 


CTATACAGAT 


TAGAGCCTGG GAGCTCTCTG GCTAGCAGGG AACCCACTGC TTAAGCCTCA 


540 


ATAAATACAG 


CTTGCCTTGA GTGCTAAAGT GGTGTGTGCC CATCCATTCG GTAACTCTGG 


600 


TACCTAGAGA 


ATCCCTCAGA CCATCTAGAC TGAGTGAAAA ATCTCTAGCA GTGGCGCCCG 


660 


AACAGGGACT 


TAGTTGAAAA CGAAAGTAGA ACCGGAGGCT GAATCTCTCG ACGCAGGACT 


720 


CGGCTCGTTG 


GTGCACACAG CGAGAGGCGA GGCGGCGGAA GTGTGAGTAC GCAATTTTGA 


780 


CTGGCGGTGG 


CCAGAAAGTA GGAGAGAGGG AGG 


813 


(2) INFORMATION FOR SEQ ID NO: 3: 





(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 1539 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: simple 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: ADN (genomic) 

(ix) FEATURE: 

(A) NAME/ KEY: CDS 

(B) LOCATION:!. .1536 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3: 
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ATG GGT GCG AGA GCG TCA GTG TTA ACA GGG GGA AAA TTA GAT CAA TGG 
Met Gly Ala Arg Ala Ser Val Leu Thr Gly Gly Lys Leu Asp Gin Trp 
1 5 10 15 

GAA TCA ATT TAT TTG AGA CCA GGG GGA AAG AAA AAA TAC AGA ATG AAA 
Glu Ser He Tyr Leu Arg Pro Gly Gly Lys Lys Lys Tyr Arg Met Lys 
20 25 30 

CAT TTA GTA TGG GCA AGC AGG GAG CTG GAA AGA TTC GCT TGT AAC CCA 
His Leu Val Trp Ala Ser Arg Glu Leu Glu Arg Phe Ala Cys Asn Pro 
35 40 45 

GGT CTC ATG GAC ACA GCG GAC GGC TGT GCC AAG TTA CTA AAT CAA TTA 
Gly Leu Met Asp Thr Ala Asp Gly Cys Ala Lys Leu Leu Asn Gin Leu 
50 55 60 

GAA CCA GCT CTC AAG ACA GGG TCA GAA GAA CTG CGC TCT TTA TAT AAC 
Glu Pro Ala Leu Lys Thr Gly Ser Glu Glu Leu Arg Ser Leu Tyr Asn 
65 70 75 80 

GCT CTA GCA GTT CTT TAT TGT GTC CAT AGT AGG ATA CAG ATA CAC AAC 
Ala Leu Ala Val Leu Tyr Cys Val His Ser Arg lie Gin He His Asn 
85 90 95 

ACA CAG GAA GCT TTG GAC AAG ATA AAA GAG AAA CAG GAA CAG CAC AAG 
Thr Gin Glu Ala Leu Asp Lys He Lys Glu Lys Gin Glu Gin His Lys 
100 ios no 

CCC GAG CCA AAA AAC CCA GAA GCA GGG GCA GCG GCA GCA ACT GAT AGC 
Pro Glu Pro Lys Asn Pro Glu Ala Gly Ala Ala Ala Ala Thr Asp Ser 
115 120 125 

AAT ATC AGT AGG AAT TAT CCT CTA GTC CAG ACT GCT CAA GGA CAA ATG 
Asn He Ser Arg Asn Tyr Pro Leu Val Gin Thr Ala Gin Gly Gin Met 
130 135 140 

GTA CAT CAG CCG CTG ACA CCC AGA ACC TTA AAT GCT TGG GTG AAA GTG 
Val His Gin Pro Leu Thr Pro Arg Thr Leu Asn Ala Trp Val Lys Val 
145 150 155 160 

ATA GAG GAG AAG GCC TTT AGT CCA GAA GTA ATA CCA ATG TTT ATG GCC 
He Glu Glu Lys Ala Phe Ser Pro Glu Val He Pro Met Phe Met Ala 
165 170 175 

TTG TCA GAA GGG GCA ACG CCC TCA GAT CTA AAT ACT ATG TTA AAT ACA 
Leu Ser Glu Gly Ala Thr Pro Ser Asp Leu Asn Thr Met Leu Asn Thr 
180 185 190 

GTA GGG GGA CAT CAG GCA GCA ATG CAG ATG CTG AAG GAA GTC ATC AAT 
Val Gly Gly His Gin Ala Ala Met Gin Met Leu Lys Glu Val He Asn 
195 200 205 

GAG GAA GCA GCA GAC TGG GAT AGG ACA CAT CCA GTC CCT GTG GGA CCA 
Glu Glu Ala Ala Asp Trp Asp Arg Thr His Pro Val Pro Val Gly Pro 
" ^N 2 1° 215 220 

V 



48 



96 



144 



192 



240 



288 



336 



384 



432 



480 



528 



576 



624 



672 
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CTA CCC CCA GGG CAA CTG AGA GAC CCT AGA GGA AGT GAT ATA GCA GGA 
Leu Pro Pro Gly Gin Leu Arg Asp Pro Arg Gly Ser Asp lie Ala Gly 
225 230 235 " 240 



720 



ACA ACT AGC ACC CTG GCA GAA CAG GTG GCT TGG ATG ACT GCT AAT CCT 
Thr Thr Ser Thr Leu Ala Glu Gin Val Ala Trp Met Thr Ala Asn Pro 
245 250 255 



768 



CCT GTT CCA GTA GGA GAT ATT TAT AGA AGA TGG ATA GTC CTG GGG TTA 
Pro Val Pro Val Gly Asp He Tyr Arg Arg Trp He Val Leu Gly Leu 
260 265 270 



816 



AAC AGA ATT GTG AGA ATG TAT AGT CCT GTC AGC ATT CTA GAG ATC AAA 
Asn Arg He Val Arg Met Tyr Ser Pro Val Ser He Leu Glu He Lys 
275 280 285 



864 



CAA GGA CCA AAA GAA CCC TTC AGA GAC TAT GTA GAC AGG TTC TAC AAA 
Gin Gly Pro Lys Glu Pro Phe Arg Asp Tyr Val Asp Arg Phe Tyr Lys 
290 295 300 



912 



ACT CTA AGA GCA GAG CAG GCA ACA CAG GAA GTA AAG AAT TGG ATG ACA 
Thr Leu Arg Ala Glu Gin Ala Thr Gin Glu Val Lys Asn Trp Met Thr 
305 310 315 " 320 



960 



GAA ACA CTC TTA GTA CAA AAT GCA AAC CCA GAT TGT AAA CAG CTC CTA 
Glu Thr Leu Leu Val Gin Asn Ala Asn Pro Asp Cys Lys Gin Leu Leu 
325 330 335 



1008 



AAA GCA TTA GGG CCA GGA GCT ACC TTA GAA GAG ATG ATG ACG GCC TGC 
Lys Ala Leu Gly Pro Gly Ala Thr Leu Glu Glu Met Met Thr Ala Cys 
340 345 350 



1056 



CAG GGA GTG GGG GGA CCA GCA CAT AAG GCA AGA GTG CTA GCA GAG GCT 
Gin Gly Val Gly Gly Pro Ala His Lys Ala Arg Val Leu Ala Glu Ala 
355 360 365 



1104 



ATG TCA CAG GTG CAG CAG CCA ACA ACT AGT GTC TTT GCA CAA AGG GGA 
Met Ser Gin Val Gin Gin Pro Thr Thr Ser Val Phe Ala Gin Arg Gly 
370 375 380 



1152 



AAC TTT AAA GGC ATA AGG AAA CCC ATT AAA TGT TTC AAT TGT GGC AAA 
Asn Phe Lys Gly He Arg Lys Pro He Lys Cys Phe Asn Cys Gly Lys 
385 390 395 ' 400 



1200 



GAG GGC CAT TTG GCA AGA AAC TGT AAG GCC CCT AGA AGA GGA GGC TGT 
Glu Gly His Leu Ala Arg Asn Cys Lys Ala Pro Arg Arg Gly Gly Cys 
405 410 415 



1248 



TGG AAG TGT GGG CAA GAA GGA CAT CAA ATG AAA GAT TGT AAA AAT GAA 
Trp Lys Cys Gly Gin Glu Gly His Gin Met Lys Asp Cys Lys Asn Glu 
420 425 430 



1296 



GGA AGA CAG GCT AAT TTT TTA GGG AAG AGC TGG TCT CCC TTC AAA GGG 
Arg Gin Ala Asn Phe Leu Gly Lys Ser Trp Ser Pro Phe Lys Gly 



1344 
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435 440 445 

AGA CCA GGA AAC TTC CCC CAG ACA ACA ACA AGG AAA GAG CCC ACA GCC 13 92 

Arg Pro Gly Asn Phe Pro Gin Thr Thr Thr Arg Lys Glu Pro Thr Ala 
450 455 460 

CCG CCA..CTA GAG AGT TAT GGG TTT CAG GAG GAG AAG AGC ACA CAG GGG 1440 

Pro Pro Leu Glu Ser Tyr Gly Phe Gin Glu Glu Lys Ser Thr Gin Gly 

465 470 475 480 

AAG GAG ATG CAG GAG AAC CAG GAG AGG ACA GAG AAC TCT CTG TAC CCA 14 88 

Lys Glu Met Gin Glu Asn Gin Glu Arg Thr Glu Asn Ser Leu Tyr Pro 
485 490 495 

CCT TTA ACT TCC CTC AGA TCA CTC TTT GGC AAC GAC CCG TCA TCA CAG 153 6 

Pro Leu Thr Ser Leu Arg Ser Leu Phe Gly Asn Asp Pro Ser Ser Gin 
500 505 510 



TAA 



1539 



(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 512 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4: 

Met Gly Ala Arg Ala Ser Val Leu Thr Gly Gly Lys Leu Asp Gin Trp 
1 5 10 15 

Glu Ser He Tyr Leu Arg Pro Gly Gly Lys Lys Lys Tyr Arg Met Lys 
20 25 30 

His Leu Val Trp Ala Ser Arg Glu Leu Glu Arg Phe Ala Cys Asn Pro 
35 40 45 

Gly Leu Met Asp Thr Ala Asp Gly Cys Ala Lys Leu Leu Asn Gin Leu 
50 55 60 

Glu Pro Ala Leu Lys Thr Gly Ser Glu Glu Leu Arg Ser Leu Tyr Asn 
65 70 75 80 

Ala Leu Ala Val Leu Tyr Cys Val His Ser Arg He Gin He His Asn 
85 90 95 

Thr Gin Glu Ala Leu Asp Lys He Lys Glu Lys Gin Glu Gin His Lys 
100 105 no 



Pro Glu Pro Lys Asn Pro Glu Ala Gly Ala Ala Ala Ala Thr Asp Ser 
115 120 125 



1 .J) 
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Asn He Ser Arg Asn Tyr Pro Leu Val Gin Thr Ala Gin Gly Gin Met 
130 135 140 

Val His Gin Pro Leu Thr Pro Arg Thr Leu Asn Ala Trp Val Lys Val 
145 150 155 * 160 

He Glu Glu Lys Ala Phe Ser Pro Glu Val He Pro Met Phe Met Ala 
165 170 175 

Leu Ser Glu Gly Ala Thr Pro Ser Asp Leu Asn Thr Met Leu Asn Thr 
180 185 190 

Val Gly Gly His Gin Ala Ala Met Gin Met Leu Lys Glu Val He Asn 
195 200 205 

Glu Glu Ala Ala Asp Trp Asp Arg Thr His Pro Val Pro Val Gly Pro 
210 215 220 

Leu Pro Pro Gly Gin Leu Arg Asp Pro Arg Gly Ser Asp He Ala Gly 
225 230 235 240 

Thr Thr Ser Thr Leu Ala Glu Gin Val Ala Trp Met Thr Ala Asn Pro 
245 250 255 

Pro Val Pro Val Gly Asp He Tyr Arg Arg Trp lie Val Leu Gly Leu 
260 265 270 

Asn Arg He Val Arg Met Tyr Ser Pro Val Ser He Leu Glu He Lys 
275 280 285 

Gin Gly Pro Lys Glu Pro Phe Arg Asp Tyr Val Asp Arg Phe Tyr Lys 
290 295 300 

Thr Leu Arg Ala Glu Gin Ala Thr Gin Glu Val Lys Asn Trp Met Thr 
3 °5 310 315 " 320 

Glu Thr Leu Leu Val Gin Asn Ala Asn Pro Asp Cys Lys Gin Leu Leu 
325 330 4 335 

Lys Ala Leu Gly Pro Gly Ala Thr Leu Glu Glu Met Met Thr Ala Cys 
340 345 350 

Gin Gly Val Gly Gly Pro Ala His Lys Ala Arg Val Leu Ala Glu Ala 
355 360 365 

Met Ser Gin Val Gin Gin Pro Thr Thr Ser Val Phe Ala Gin Arg Gly 
370 375 380 

Asn Phe Lys Gly He Arg Lys Pro He Lys Cys Phe Asn Cys Gly Lys 
3Q 5 390 395 400 

Glu Gly His Leu Ala Arg Asn Cys Lys Ala Pro Arg Arg Gly Gly Cys 
405 410 ~ ~ 415 

Trp Lys Cys Gly Gin Glu Gly His Gin Met Lys Asp Cys Lys Asn Glu 
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420 425 430 

Gly Arg Gin Ala Asn Phe Leu Gly Lys Ser Trp Ser Pro Phe Lys Gly 
435 440 445 

Arg Pro Gly Asn Phe Pro Gin Thr Thr Thr Arg Lys Glu Pro Thr Ala 
450 455 460 

Pro Pro Leu Glu Ser Tyr Gly Phe Gin Glu Glu Lys Ser Thr Gin Gly 
465 470 475 480 

Lys Glu Met Gin Glu Asn Gin Glu Arg Thr Glu Asn Ser Leu Tyr Pro 
485 490 495 

Pro Leu Thr Ser Leu Arg Ser Leu Phe Gly Asn Asp Pro Ser Ser Gin 
500 505 510 



(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 3 045 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : simple 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: ADN (genomic) 



(ix) FEATURE : 

(A) NAME /KEY : CDS 

(B) LOCATION:!. .3042 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5: 



TTT TTT AGG GAA GAG CTG GTC TCC CTT CAA AGG GAG ACC AGG AAA CTT 4 8 

Phe Phe Arg Glu Glu Leu Val Ser Leu Gin Arg Glu Thr Arg Lys Leu 
515 520 S2S 

CCC CCA GAC AAC AAC AAG GAA AGA GCC CAC AGO CCC GCC ACT AGA GAG 96 

Pro Pro Asp Asn Asn Lys Glu Arg Ala His Ser Pro Ala Thr Arg Glu 
530 535 540 

TTA TGG GTT TCA GGA GGA GAA GAG CAC AC A GGG GAA GGA GAT GCA GGA 144 

Leu Trp Val Ser Gly Gly Glu Glu His Thr Gly Glu Gly Asp Ala Gly 

545 550 555. * 560 

GAA CCA GGA GAG GAC AGA GAA CTC TCT GTA CCC ACC TTT AAC TTC CCT 192 

Glu Pro Gly Glu Asp Arg Glu Leu Ser Val Pro Thr Phe Asn Phe Pro 
565 570 575 

CAG ATC ACT CTT TGG CAA CGA CCC GTC ATC ACA GTA AAA ATA GGG AAA 24 0 

Gin He Thr Leu Trp Gin Arg Pro Val He Thr Val Lys He Gly Lys 

580 585 590 



ho 
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GAA GTA AGA GAA GCT CTT TTA GAT ACA GGA GCT GAT GAT ACA GTA ATA 288 
Glu Val Arg Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val He 
595 600 605 

GAA GAG CTA CAA TTA GAG GGA AAA TGG AAA CCA AAA ATG ATA GGA GGA 3 36 
Glu Glu Leu Gin Leu Glu Gly Lys Trp Lys Pro Lys Met He Gly Gly 
610 615 t 620 

ATT GGA GGA TTT ATC AAA GTG AGA CAA- TAT GAT AAT ATA ACA GTA GAC 3 84 
He Gly Gly Phe He Lys Val Arg Gin Tyr Asp Asn He Thr Val Asp. 
625 630 635 640 

ATA CAG GGA AGA AAA GCA GTT GGT ACA GTA TTA GTA GGA CCA ACA CCT 432 
He Gin Gly Arg Lys Ala Val Gly Thr Val Leu Val Gly Pro Thr Pro 
645 650 655 

GTT AAT ATT ATA GGA AGA AAT CTT TTA ACC CAG ATT GGC TGT ACT TTA 4 80 
Val Asn He He Gly Arg Asn Leu Leu Thr Gin He Gly Cys Thr Leu 
660 665 670 

AAT TTT CCA ATA AGT CCT ATT GAA ACT GTA CCA GTA AAA TTA AAA CCA 528 
Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys Pro 
675 680 685 

GGA ATG GAT GGC CCA AAG GTA AAA CAA TGG CCT TTG ACA ACA GAA AAA 576 
Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Thr Glu Lys 
690 695 700 

ATA GAG GCA TTA AGA GAA ATT TGT ACA GAA ATG GAA AAG GAA GGA AAA 624 
He Glu Ala Leu Arg Glu He Cys Thr Glu Met Glu Lys Glu Gly Lys 
7 °5 710 715 720 

ATT TCT AGA ATA GGG CCT GAG AAT CCA TAT AAC ACT CCA ATT TTT GCT 672 
He Ser Arg He Gly Pro Glu Asn Pro Tyr Asn Thr Pro He Phe Ala " 
725 730 735 

ATA AAA AAG AAA GAT AGC ACT AAA TGG AGA AAA TTA GTA GAT TTC AGG 72 0 
He Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe Arg 
740 745 750 

GAA TTA AAT AAA AGG ACC CAA GAT TTT TGG GAA GTG CAG CTA GGA ATT 768 
Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly He 
755 760 765 

CCA CAT CCA GCA GGA TTA AAG CAG AAA AAA TCA GTG ACA GTT TTG GAT 816 
Pro His Pro Ala Gly Leu Lys Gin Lys Lys Ser Val Thr Val Leu Asp 
770 775 780 

GTA GGA GAT GCT TAT TTT TCA TGT CCC TTG GAC AAA GAT TTT AGA AAG 864 
Val Gly Asp Ala Tyr Phe Ser Cys Pro Leu Asp Lys Asp Phe Arg Lys 
785 790 795 800 

TAT ACA GCT TTT ACC ATA CCT AGT ATA AAC AAT GAG ACA CCT GGT ATT 912 
Tyr Thr Ala Phe Thr He Pro Ser He Asn Asn Glu Thr Pro Gly He 
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805 



810 



815 




5 s 2 g 2 s 2 2 a e - 2 s s 

825 830 

£££2222222222222 

840 845 

S So o?u ne ne nl £ oT ? C ATG GAT GAC CTC TAT GTG 
850 IS Yr Met Asp Asp Leu ^ Val oiy 

855 860 

£ 2 2 2 « £ « 2 2 2 2 2 2 2 2 2 

870 675 e80 

GAT CAT CTT TTG AAG TGG GGC TTT ACG ACC CCT GAC AAA AAA CAT CAG 
Asp His Leu Leu Lys Trp Glv Phe Thr Thr dw, » , G 
7 ip ^ Fne ™r Thr Pro Asp Lys Lys His Gin 

885 89 ° 895 

2 2 2 2 2 2 2 2 5 2 Si 2 2 2 2 £ 

905 910 

5S r ^ CCA GAA ^ GAT «A TGG ACT GTC 

Trp Thr Val Gin Pro lie Lys Leu Pro Glu Lys Asp Val Trp Thr VaT 

920 925 

£££2222222222222 

935 940 

"r PS Gly £ Ar* £ ^ £ T ™ ™ ^ AGA GGA « 
945 7 Ar9 J*l Gin Leu Cys Lys Leu lie Arg Gly Ala 

9 55 960 
AGA GCT TTG ACA GAA GTA GTC AAC TTT irn „ 

« a u» Thr „. val m 2 2 £ 2 2 £ 2 2 2 

970 975 

2 2 2 2 2 2 2 2 2.2 2 2 2 2 S 2 

985 99Q 

2 2 2 2 £ 2 2 2 2 2 2 2 2 2 2 2 

iooo 1005 

% 2 2 2 2 2 2 2 2 2 2 2 r r TO "» 

1010 U Leu Hls L V S Asn Leu Lys 

1015 1020 

GGA AAG TAT GCA AAA ATG AGA TCT GCC CAT ACT AAT GAT ATA AAA 



960 



1008 



1056 



1104 



1152 



1200 



1248 



1296 



1344 



1392 



1440 



1488 



1536 



1584 
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Thr Gly Lys ^ Ala Arg ftia a ^ ^ e 

1030 1Q 35 1040 

2 2 2 2 22 S £ 2 S 2 giu S 2 s 2 

045 1050 ' 1055 

S SI ^ IF *** ^ AGA TTA CCA GTA CAA AAG GAA GTG TGG 

Trp Gly Lys Thr Pro Lys Phe Arg Leu Pro Val Gin Lys Glu 111 J£ 

1060 1070 

G?u Ala S S £ T T ? T TG ° GCA ACT TGG *™ CCT GAG TGG 

loV P 13 TrP Gln Ala Thr Tr P Ile p ™ Glu Trp 

~-' J 1080 1085 

Z vl C ^ CTT GTA TTA TGG TAT CAG TTA GAA 

Glu Phe Val Asn Thr Pro Pro Leu Val Lys Leu Trp Tyr Gin Ju Tu 

1095 1100 

i£ r f GGG GCA GAA ACT 710 TAT GTA GAT GGA GCA GCT 

Thr Glu Pro lie Ser Gly Ala Glu Thr Phe Tyr Val Asp Gly A^ £ 

1110 "15 1120 

AAT AGG GAA ACA AAA TTG GGA AAA GCA GGT TTT GTG ACA GAT AGG GGA 
Asn Arg Glu Thr Lys Leu Gly Lys Ala Gly Phe Val Thr Asp £g Gly 
1125 "30 U35 

AGA CAG AAA GTG GTC TCT ATT GCA GAC ACC ACC AAT CAA AAG GCT GAC 
Arg Gin Lys Val Val Ser lie Ala Asp Thr Thr Asn Gin ^s S G^u 
1140 11« uso 

llu G^ ATG GCC TTA CAA GAG TCA GGA CGG GAT GTA AAC 

L eu Gin Ala lie Leu Met Ala Leu Gin Glu Ser Gly Arg Asp 111 £n 

1160 1165 

Ue J!" ?hr °f Sf° TAT GCT ATG «» ATA A ™ CAT TCA CAG CCA 

Val Thr Asp Ser Gin Tyr Ala Met Gly He lie His Ser Gin Pro 

1175 1180 

22 Z 2 2 22 - 2 2 2 2 2 2 2 2 

1195 1200 

»AG GAA AGA GTT TAT CTC TCT TGG GTA CCT GCA C«T ,„ ^ „™ 
Ly ■«. Giu A rg v.l Tyr Leu Ser Trp Val 2 S 2 £ ™ 2 

12 05 151 a 

1210 1215 

Gly Gly Ifn g?! St G f r ™ GTT AGC TCA GGA ATT AGA ™ 
y Gly Asn Glu^Gln Val Asp Lys Leu Val Ser Ser Gly He Arg Lys 

i22S 1230 

2 2 22 Z % 2 2 2 2 2 2 2 2 2 2 

1240 1245 



1632 



1680 



1728 



1776 



1824 



1872 



1920 



1968 



2016 



2064 



2112 



2160 



2208 
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TAT CAC AGC AAT TGG AAA GCA ATG GCC AGT GAT TTT AAC TTA CCC CCC 22 56 
Tyr His Ser Asn Trp Lys Ala Met Ala Ser Asp Phe Asn Leu Pro Pro 
12 50 1255 1260 

ATA GTG GCA AAA GAA ATA GTA GCC AGC TGT GAC AAA TGC CAG CTA AAA 2304 
lie Val Ala Lys Glu He Val Ala Ser Cys Asp Lys Cys Gin Leu Lys 
1265 1270 1275 ' 1280 

GGG GAA GCC ATG CAT GGA CAG GTC AAT TGT AGT CCA GGA GTG TGG CAA 2352 
Gly Glu Ala Met His Gly Gin Val Asn Cys Ser Pro Gly Val Trp Gin 
1285 1290 ' 1235 

TTA GAT TGT ACA CAC TTA GAG GGA AAA ATC ATC CTT GTG GCG GTC CAT 2400 
Leu Asp Cys Thr His Leu Glu Gly Lys He He Leu Val Ala Val His 
1300 1305 1310 

GTG GCC AGT GGC TAC TTA GAA GCA GAA GTT ATT CCT GCA GAG ACA GGA 2448 
Val Ala Ser Gly Tyr Leu Glu Ala Glu Val He Pro Ala Glu Thr Gly 
1315 1320 1325 

CAG GAA ACA GCA TAT TTT ATT TTA AAG TTA GCT GGA AGA TGG CCA GTA 24 96 
Gin Glu Thr Ala Tyr Phe He Leu Lys Leu Ala Gly Arg Trp Pro Val 
1330 1335 1340 

AAA GTT ATA CAC ACT GAT AAT GGA TCC AAT TTC ACT AGT GCC ACT GTA 2544 
Lys Val He His Thr Asp Asn Gly Ser Asn Phe Thr Ser Ala Thr Val 
1345 1350 1355 1360 

AAA GCA GCC TGT TGG TGG GCA AAT ATC AAA CAG GAA TTT GGG ATA CCC 2592 
Lys Ala Ala Cys Trp Trp Ala Asn He Lys Gin Glu Phe Gly He Pro 
1365 1370 1375 

TAC AAT CCT CAA AGT CAG GGA GCA GTA GAG TCC ATG AAT AAA GAA TTA 2640 
Tyr Asn Pro Gin Ser Gin Gly Ala Val Glu Ser Met Asn Lys Glu Leu 
1380 1385 1390 

AAG AAA ATT ATA GGA CAA ATC AGA GAT CAA GCA GAA CAT CTA AAG ACA 2688 
Lys Lys He He Gly Gin He Arg Asp Gin Ala Glu His Leu Lys Thr 
1395 1400 140S 

GCA GTG CAA ATG GCG GTT TTC ATT CAC AAT TTT AAA AGA AAA GGG GGG 2736 
Ala Val Gin Met Ala Val Phe He His Asn Phe Lys Arg Lys Gly Gly 
1410 1415 1420 

ATT GGG GGG TAC ACT GCA GGG GAA AGA ATA ATA GAC ATA ATA GCA ACA 2784 
He Gly Gly Tyr Thr Ala Gly Glu Arg He He Asp He He Ala Thr 
1425 1430 1435 1440 

GAC ATA CAG ACA ACA AAT TTA CAA ACA CAA ATT TTA AAA GTT CAA AAT 2832 
Asp He Gin Thr Thr Asn Leu Gin Thr Gin He Leu Lys Val Gin Asn 
1445 1450 1455 

TTT CGG GTT TAT TAC AGA GAC AGC AGA GAT CCC ATT TGG AAA GGA CCA 2880 
Phe Arg Val Tyr Tyr Arg Asp Ser Arg Asp Pro He Trp Lys Gly Pro 
146 ° 1465 1470 



J 
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GCC AAA CTT CTG TGG AAA GGA GAA GGG GCA GTG GTA ATT CAA GAT AAC 2 928 
Ala Lys Leu Leu Trp Lys Gly Glu Gly Ala Val Val He Gin Asp Asn 
1475 1480 1485 

GGG GAT ATA AAA GTA GTC CCA CGT AGG AAA GCA AAA ATA ATT AGG GAT 2 976 
Gly Asp He Lys Val Val Pro Arg Arg Lys Ala Lys He He Arg Asp 
1490 1495 1500 

TAT GGA AAA CAG ATG GCA GGT GAT GGT TGT GTG GCA AGT GGA CAG GAT 3024 
Tyr Gly Lys Gin Met Ala Gly Asp Gly Cys Val Ala Ser Gly Gin Asp 
1505 1510 1515' 1520 

GAA AAT CAG GAA ATG GAA TAG 3 045 

Glu Asn Gin Glu Met Glu 
1525 



(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 1014 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY:' linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6: 

Phe Phe Arg Glu Glu Leu Val Ser Leu Gin Arg Glu Thr Arg Lys Leu 
15 10 15 

Pro Pro Asp Asn Asn Lys Glu Arg Ala His Ser Pro Ala Thr Arg Glu 
20 25 30 

Leu Trp Val Ser Gly Gly Glu Glu His Thr Gly Glu Gly Asp Ala Gly 
35 40 45 

Glu Pro Gly Glu Asp Arg Glu Leu Ser Val Pro Thr Phe Asn Phe Pro 
50 55 60 

Gin He Thr Leu Trp Gin Arg Pro Val He Thr Val Lys He Gly Lys 
65 70 75 80 

Glu Val Arg Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val He 
85 90 95 

Glu Glu Leu Gin Leu Glu Gly Lys Trp Lys Pro Lys Met He Gly Gly 
100 105 no 

He Gly Gly Phe He Lys Val Arg Gin Tyr Asp Asn He Thr Val Asp 
115 120 125 




He Gin Gly Arg Lys Ala Val Gly Thr Val Leu Val Gly Pro Thr Pro 
130 135 140 



K OFF\^ 
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Val Asn He He Gly Arg Asn Leu Leu Thr Gin He Gly Cys Thr Leu 
145 150 155 ' ' 160 

Asn Phe Pro He Ser Pro He Glu Thr Val Pro Val Lys Leu Lys Pro 
165 170 175 

Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu Thr Thr Glu Lys 
180 * 185 190 

He Glu Ala Leu Arg Glu He Cys Thr Glu Met Glu Lys Glu Gly Lys 
195 200 205 

He Ser Arg He Gly Pro Glu Asn Pro Tyr Asn Thr Pro He Phe Ala 
210 215 220 

He Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys Leu Val Asp Phe Arg 
22 5 230 235 * 240 

Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu Val Gin Leu Gly He 
245 t 250 255 

Pro His Pro Ala Gly Leu Lys Gin Lys Lys Ser Val Thr Val Leu Asp 
260 265 270 

Val Gly Asp Ala Tyr Phe Ser Cys Pro Leu Asp Lys Asp Phe Arg Lys 
275 280 285 

Tyr Thr Ala Phe Thr He Pro Ser He Asn Asn Glu Thr Pro Gly He 
290 295 300 

Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys Gly Ser Pro Ala 
3 °5 310 315 320 

He Phe Gin Ser Thr Met Thr Lys He Leu Glu Pro Phe Arg Glu Lys 
325 330 335 

His Pro Glu He He He Tyr Gin Tyr Met Asp Asp Leu Tyr Val Gly 
340 345 350 

Ser Asp Leu Glu Leu Ala Gin His Arg Glu Ala Val Glu Asp Leu Arg 
355 360 365 

Asp His Leu Leu Lys Trp Gly Phe Thr Thr Pro Asp Lys Lys His Gin 
370 375 380 

Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu Leu His Pro Asp Lys 
385 390 395 400 

Trp Thr Val Gin Pro He Lys Leu Pro Glu Lys Asp Val Trp Thr Val 
405 410 415 




Asn Asp He Gin Lys Leu Val Gly Lys Leu Asn Trp Ala Ser Gin He 
420 425 430 

Pro Gly He Arg Val Lys Gin Leu Cys Lys Leu He Arg Gly Ala 
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435 



440 



445 



Arg Ala Leu Thr Glu Val Val Asn Phe Thr Glu Glu Ala Glu Leu Glu 
450 455 460 

Leu Ala Glu Asn Arg Glu lie Leu Lys Glu Pro Leu His Gly Val Tyr 
465 470 475 480 

Tyr Asp Pro Gly Lys Glu Leu Val Ala Glu He Gin Lys Gin Gly Gin 
485 490 495 

Gly Gin Trp Thr Tyr Gin He Tyr Gin Glu Leu His Lys Asn Leu Lys 
500 505 510 

Thr Gly Lys Tyr Ala Lys Met Arg Ser Ala His Thr Asn Asp He Lys 
515 520 525 

Gin Leu Val Glu Val Val Arg Lys Val Ala Thr Glu Ser He Val He 
530 535 540 

Trp Gly Lys Thr Pro Lys Phe Arg Leu Pro Val Gin Lys Glu Val Trp 
545 550 555 560 

Glu Ala Trp Trp Thr Asp His Trp Gin Ala Thr Trp He Pro Glu Trp 
565 570 575 

Glu Phe Val Asn Thr Pro Pro Leu Val Lys Leu Trp Tyr Gin Leu Glu 
580 585 590 

Thr Glu Pro He Ser Gly Ala Glu Thr Phe Tyr Val Asp Gly Ala Ala 
595 600 605 

Asn Arg Glu Thr Lys Leu Gly Lys Ala Gly Phe Val Thr Asp Arg Gly 
610 615 620 

Arg Gin Lys Val Val Ser He Ala Asp Thr Thr Asn Gin Lys Ala Glu 
625 630 635 640 

Leu Gin Ala He Leu Met Ala Leu Gin Glu Ser Gly Arg Asp Val Asn 
645 650 655 

He Val Thr Asp Ser Gin Tyr Ala Met Gly He He His Ser Gin Pro 
660 665 670 

Asp Lys Ser Glu Ser Glu Leu Val Ser Gin He He Glu Glu Leu He 
675 680 685 

Lys Lys Glu Arg Val Tyr Leu Ser Trp Val Pro Ala His Lys Gly He 
690 695 700 

Gly Gly Asn Glu Gin Val Asp Lys Leu Val Ser Ser Gly He Arg Lys 
705 710 715 720 

He Leu Phe Leu Asp Gly He Glu Lys Ala Gin Glu Asp His Asp Arg 



725 



730 



735 




42 



Tyr His Ser Asn Trp Lys Ala Met Ala Ser Asp Phe Asn Leu Pro Pro 
740 745 750 

He Val Ala Lys Glu He Val Ala Ser Cys Asp Lys Cys Gin Leu Lys 
755 760 765 

Gly Glu Ala Met His Gly Gin Val Asn Cys Ser Pro Gly Val Trp Gin 
770 775 780 

Leu Asp Cys Thr His Leu Glu Gly Lys He He Leu Val Ala Val His 
785 790 795 800 

Val Ala Ser Gly Tyr Leu Glu Ala Glu Val He Pro Ala Glu Thr Gly 
805 810 815 

Gin Glu Thr Ala Tyr Phe He Leu Lys Leu Ala Gly Arg Trp Pro Val 
820 825 830 

Lys Val He His Thr Asp Asn Gly Ser Asn Phe Thr Ser Ala Thr Val 
835 840 845 

Lys Ala Ala Cys Trp Trp Ala Asn He Lys Gin Glu Phe Gly lie Pro 
850 855 860 

Tyr Asn Pro Gin Ser Gin Gly Ala Val Glu Ser Met Asn Lys Glu Leu 
865 870 875 880 

Lys Lys He He Gly Gin He Arg Asp Gin Ala Glu His Leu Lys Thr 
885 890 895 

Ala Val Gin Met Ala Val Phe He His Asn Phe Lys Arg Lys Gly Gly 
900 905 910 

He Gly Gly Tyr Thr Ala Gly Glu Arg He He Asp He He Ala Thr 
915 920 925 

Asp He Gin Thr Thr Asn Leu Gin Thr Gin lie Leu Lys Val Gin Asn 
930 935 940 

Phe Arg Val Tyr Tyr Arg Asp Ser Arg Asp Pro He Trp Lys Gly Pro 
945 950 955 960 

Ala Lys Leu Leu Trp Lys Gly Glu Gly Ala Val Val He Gin Asp Asn 
965 970 975 

Gly Asp He Lys Val Val Pro Arg Arg Lys Ala Lys He He Arg Asp 
980 985 990 



Tyr Gly Lys Gin Met Ala Gly Asp Gly Cys Val Ala Ser Gly Gin Asp 
995 1000 1005 



Glu Asn Gin Glu Met Glu 
^ 1010 
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(2) INFORMATION FOR SEQ ID NO : 7: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 579 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: simple 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: ADN (genomic) 

(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 1 . .576 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7: 



ATG GAA AAC AGA TGG CAG GTG ATG GTT GTG TGG CAA GTG GAC AGG ATG 
Met Glu Asn Arg Trp Gin Val Met Val Val Trp Gin Val Asp Arg Met 
1015 1020 1025 1030 



48 



AAA ATC AGG AAA TGG AAT AGC TTA GTA AAA CAT CAT ATG TAT GTG TCA 
Lys He Arg Lys Trp Asn Ser Leu Val Lys His His Met Tyr Val Ser 
1035 1040 1045 



96 



AAA AAG GCA AAA GGA TGG TAT TAT 'AGA CAT CAT TAT GAA ACA CAT CAC 
Lys Lys Ala Lys Gly Trp Tyr Tyr Arg His His Tyr Glu Thr His His 
1050 1055 1060 



144 



CCA AAA ATA AGT TCA GAA GTA CAT ATC CCA GTA GGT CAG GCA AGA TTA 
Pro Lys He Ser Ser Glu Val His He Pro Val Gly Gin Ala Arg Leu 
1065 1070 1075 



192 



GTG ACA GTC ACT TAT TGG GGG CTA ACA ACA GGA GAA CAG TCT TGG CAT 
Val Thr Val Thr Tyr Trp Gly Leu Thr Thr Gly Glu Gin Ser Trp His 
1080 1085 1090 



240 



CTA GGA CAT GGA GTA TCC ATA GAA TGG AGA CTA AGA AAA TAC AAG ACA 
Leu Gly His Gly Val Ser He Glu Trp Arg Leu Arg Lys Tyr Lys Thr 
1095 1100 1105 1110 



288 



CAA GTT GAT CCT GAA ATG GCA GAC AAG CTA ATA CAT CTT CAT TAT TTT 
Gin Val Asp Pro Glu Met Ala Asp Lys Leu He His Leu His Tyr Phe 
1115 1120 1125 



336 



GAT TGT TTT ACA GCC TCT GCC ATA AGG CAA GCG GTC TTA GGG AGA CCA 
Asp Cys Phe Thr Ala Ser Ala He Arg Gin Ala Val Leu Gly Arg Pro 
1130 1135 1140 



384 



GTA TTA CCT AGG TGT GAA TAT CCA GCA GGG CAC AAA CAG GTA GGC ACC 
Val Leu Pro Arg Cys Glu Tyr Pro Ala Gly His Lys Gin Val Gly Thr 
1145 1150 1155 



432 




CTA CAA TAT CTA GCA CTA ACA GCC TGG GTG GGA GCA AAG AAG AGA AAG 
Leu Gin Tyr Leu Ala Leu Thr Ala Trp Val Gly Ala Lys Lys Arg Lys 



480 
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1160 1165 

CCA CCC TTA CCT AGT GTG ACT AAG CTA 
Pro Pro Leu Pro Ser Val Thr Lys Leu 
1175 1180 

CAC CAG AAG ATG CAG GGC CAC AGA GGG 
His Gin Lys Met Gin Gly His Arg Gly 
1195 

TAG 



1170 

ACA GAA GAT AGA TGG AAC GAG 528 
Thr Glu Asp Arg Trp Asn Glu 
1185 1190 

AAC CCT ATA ATG AAT GGG CAC 576 
Asn Pro He Met Asn Gly His 
1200 1205 

579 



(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 192 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8: 

Met Glu Asn Arg Trp Gin Val Met Val Val Trp Gin Val Asp Arg Met 
15 10 15 

Lys He Arg Lys Trp Asn Ser Leu Val Lys His His Met Tyr Val Ser 
20 25 30 

Lys Lys Ala Lys Gly Trp Tyr Tyr Arg His His Tyr Glu Thr His His 
35 40 45 

Pro Lys He Ser Ser Glu Val His He Pro Val Gly Gin Ala Arg Leu 
50 55 60 

Val Thr Val Thr Tyr Trp Gly Leu Thr Thr Gly Glu Gin Ser Trp His 
65 70 75 80 

Leu Gly His Gly Val Ser He Glu Trp Arg Leu Arg Lys Tyr Lys Thr 
85 90 95 

Gin Val Asp Pro Glu Met Ala Asp Lys Leu He His Leu His Tyr Phe 
100 105 110 

Asp Cys Phe Thr Ala Ser Ala He Arg Gin Ala Val Leu Gly Arg Pro 
115 120 125 

Val Leu Pro Arg Cys Glu Tyr Pro Ala Gly His Lys Gin Val Gly Thr 
130 135 140 

Leu Gin Tyr Leu Ala Leu Thr Ala Trp Val Gly Ala Lys Lys Arg Lys 
145 150 155 ' " 160 



y 



Pro Pro Leu Pro Ser Val Thr Lys Leu Thr Glu Asp Arg Trp Asn Glu 
165 170 175 



45 



His Gin Lys Met Gin Gly His Arg Gly Asn Pro lie Met Asn Gly His 
180 185 . 190 



(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 288 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : simple 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: ADN (genomic) 

(ix) FEATURE: 

(A) NAME/ KEY : CDS 

(B) LOCATION: 1. .285 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9: 



ATG GAA CGA GCA CCA GAA GAT GCA GGG CCA CAG AGG GAA CCC TAT AAT 
Met Glu Arg Ala Pro Glu Asp Ala Gly Pro Gin Arg Glu Pro Tyr Asn 
195 200 205 



48 



GAA TGG GCA CTA GAA TTA TTA GAA GAA TTA AAA AAT GAA GCT GTG CGC 
Glu Trp Ala Leu Glu Leu Leu Glu Glu Leu Lys Asn Glu Ala Val Arg 
210 215 220 



96 



CAT TTT CCA AGG ATT TGG CTA CAT GGG TTA GGA CAA CAC ATC TAT AAC 
His Phe Pro Arg He Trp Leu His Gly Leu Gly Gin His He Tyr Asn 
225 230 235 240 



144 



ACA TAT GGA GAC ACC TGG GAG GGG GTA GAG GCA ATT ATC AGG ATA CTA 
Thr Tyr Gly Asp Thr Trp Glu Gly Val Glu Ala He He Arg He Leu 
245 250 255 



192 



CAA CAA TTA CTG TTT ATC CAT TAT AGG ATT GGC TGC CAG CAC AGC AGA 
Gin Gin Leu Leu Phe He His Tyr Arg He Gly Cys Gin His Ser Arg 
260 265 270 



240 



ATA GGG ATC ACT CCT CAA AGG AGA AGG AAT GGA ACC AGT AGA TCC 
He Gly He Thr Pro Gin Arg Arg Arg Asn Gly Thr Ser Arg Ser 
275 280 285 



285 



TAG 



288 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 95 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 



Si 
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(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

Met Glu Arg Ala Pro Glu Asp Ala Gly Pro Gin Arg Glu Pro Tyr Asn 
15 10 15 

Glu Trp Ala Leu Glu Leu Leu Glu Glu Leu Lys Asn Glu Ala Val Arg 
20 25 30 

His Phe Pro Arg He Trp Leu His Gly Leu Gly Gin His He Tyr Asn 
35 40 45 

Thr Tyr Gly Asp Thr Trp Glu Gly Val Glu Ala He He Arg He Leu 
50 55 60 

Gin Gin Leu Leu Phe He His Tyr Arg He Gly Cys Gin His Ser Arg 
65 70 75 80 

He Gly He Thr Pro Gin Arg Arg Arg Asn Gly Thr Ser Arg Ser 
85 90 95 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGHT: 252 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : simple 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: ADN (genomic) 



(ix) FEATURE: 

(A) NAME/ KEY: CDS 

(B) LOCATION : 1 . .249 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 11: 



ATG CTG TCA TTG GGA TTC ATA GCG TTA GGA GCA GCA GTT AGC ATA GCA 
Met Leu Ser Leu Gly Phe He Ala Leu Gly Ala Ala Val Ser He Ala 
100 105 110 
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GTA ATA GTC TGG GCA TTA CTA TAT AGA GAA TAT AAG AAA ATA AAA TTG 
Val He Val Trp Ala Leu Leu Tyr Arg Glu Tyr Lys Lys He Lys Leu 
115 120 125 



96 



CAG GAA AAA ATA AAA CAC ATA AGA CAG AGA ATA AGA GAA AGA GAA GAA 
Gin Glu Lys He Lys His He Arg Gin Arg He Arg Glu Arg Glu Glu 
130 135 140 



144 



GAT AGT GGC AAT GAA AGT GAT GGG GAT GCA GAG TGG TTG GAT GGG GAT 
Asp Ser Gly Asn Glu Ser Asp Gly Asp Ala Glu Trp Leu Asp Gly Asp 
145 150 155 



192 



> 
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GAA GAG TGG TTG GTT ACT CTT CTA TCT TCT AGT AAG CTT GAT CAA GGT 24 0 
Glu Glu Trp Leu Val Thr Leu Leu Ser Ser Ser Lys Leu Asp Gin Gly 
160 165 170 175 

AAT TGG GTC TGA 252 
Asn Trp Val 



(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGHT: 83 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

Met Leu Ser Leu Gly Phe He Ala Leu Gly Ala Ala Val Ser He Ala 
15 10 15 

Val He Val Trp Ala Leu Leu Tyr Arg Glu Tyr Lys Lys He Lys Leu 
20 25 30 

Gin Glu Lys He Lys His He Arg Gin Arg He Arg Glu Arg Glu Glu 
35 40 45 

Asp Ser Gly Asn Glu Ser Asp Gly Asp Ala Glu Trp Leu Asp Gly Asp 
50 55 60 

Glu Glu Trp Leu Val Thr Leu Leu Ser Ser Ser Lys Leu Asp Gin Gly 
65 70 75 80 

Asn Trp Val 



(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 3 06 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : simple 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: ADN (genomic) 



(ix) FEATURE: 

(A) NAME/ KEY : CDS 

(B) LOCATION:!. .3 03 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 



ATG GAA CCA GTA GAT CCT AGA TTA GAG CCC TGG AAT CAT CCA GGA AGC 4 8 
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Met Glu Pro Val Asp Pro Arg Leu Glu Pro Trp Asn His Pro Gly Ser 
85 90 95 

CAA CCT AAA ACA GCT TGC AAT AAT TGC TAT TGT AAA AGA TGT TGC TAT 96 
Gin Pro Lys Thr Ala Cys Asn Asn Cys Tyr Cys Lys Arg Cys Cys Tyr 
100 105 110 115 

CAC TGC TTA TAT TGC TTC ACA AAG AAA GGC TTA GGC ATC TCA TAT GGC 144 
His Cys Leu Tyr Cys Phe Thr Lys Lys Gly Leu Gly lie Ser Tyr Gly 
120 125 130 

AGG AAG AAG CGG AGT CAA CGA CGA AGA ACT CCT CAG AGC AGT AAG AGT 192 
Arg Lys Lys Arg Ser Gin Arg Arg Arg Thr Pro Gin Ser Ser Lys Ser 
135 140 145 

CAT CAA GAT CTT ATA CCA GAG CAG CCC TTA TCC CAA CAG CAA GGG GAC 240 
His Gin Asp Leu He Pro Glu Gin Pro Leu Ser Gin Gin Gin Gly Asp 
150 155 160 

CAG ACA GGC CAG AAG AAA CAG AAG GAG GCG TTG GAG AGC AAG ACA GAG 2 88 
Gin Thr Gly Gin Lys Lys Gin Lys Glu Ala Leu Glu Ser Lys Thr Glu 
165 170 175 

GCA GAT CCG TGC GAT TAG 3 06 

Ala Asp Pro Cys Asp 

180 



(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGHT: 101 amino acids 
.. (B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

Met Glu Pro Val Asp Pro Arg Leu Glu Pro Trp Asn His Pro Gly Ser 
1 5 10 15 

Gin Pro Lys Thr Ala Cys Asn Asn Cys Tyr Cys Lys Arg Cys Cys Tyr 
20 25 30 

His Cys Leu Tyr Cys Phe Thr Lys Lys Gly Leu Gly He Ser Tyr Gly 
3 5 40 45 

Arg Lys Lys Arg Ser Gin Arg Arg Arg Thr Pro Gin Ser Ser Lys Ser 
50 55 60 

His Gin Asp Leu He Pro Glu Gin Pro Leu Ser Gin Gin Gin Gly Asp 
*5 70 75 80 

Gin Thr Gly Gin Lys Lys Gin Lys Glu Ala Leu Glu Ser Lys Thr Glu 
. JT^ 85 90 " 95 



/■n r<r, J 
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Ala Asp Pro Cys Asp 
100 

(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 3 69 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : simple 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: ADN (genomic) 

(ix) FEATURE: 

(A) NAME/ KEY: CDS 

(B) LOCATION : 1 . .366 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 



ATG GCA GGA AGA AGC GGA GTC AAC GAC GAA GAA CTC CTC AGA GCA GTA 48 

Met Ala Gly Arg Ser Gly Val Asn Asp Glu Glu Leu Leu Arg Ala Val 

105 110 115 

AGA GTC ATC AAG ATC TTA TAC CAG AGC AGT TAT CCC AAC AGC AAG GGG 96 

Arg Val He Lys He Leu Tyr Gin Ser Ser Tyr Pro Asn Ser Lys Gly 

120 125 130 

ACC AGA CAG GCC AGA AGA AAC AGA AGG AGG CGT TGG AGA GCA AGA CAG 144 

Thr Arg Gin Ala Arg Arg Asn Arg Arg Arg Arg Trp Arg Ala Arg Gin 

135 140 145 

AGG CAG ATC CGT . GCG ATT AGT GAG CGG ATT CTC AGC TCT TGT CTG GGA 192 

Arg Gin He Arg Ala He Ser Glu Arg He Leu Ser Ser Cys Leu Gly 

150 155 160 165 

GGA CCT CCG GAA CCT GTT GAT CTT CCT CTA CCA CCG CTT GAC AGA CTC 24 0 

Gly Pro Pro Glu Pro Val Asp Leu Pro Leu Pro Pro Leu Asp Arg Leu 

170 175 * 180 

ACT CTT GAT ACT GAG GAG GAC TCT GGA ACT CCT GGG ACA GAG TCT CAG 288 

Thr Leu Asp Thr Glu Glu Asp Ser Gly Thr Pro Gly Thr Glu Ser Gin 

185 190 195 

CAG GGG ACT GCA ACT ACT GAA TGA ACT CAG AAC ACA CTT GTG GGG AAT 336 

Gin Gly Thr Ala Thr Thr Glu * Thr Gin Asn Thr Leu Val Gly Asn 

200 205 210 

ACT TGC ATA TTG GGG AAA AGA GTT AAG GGA TAG 369 
Thr Cys He Leu Gly Lys Arg Val Lys Gly 
215 220 



(2) INFORMATION FOR SEQ ID NO: 16: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 122 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

Met Ala Gly Arg Ser Gly Val Asn Asp Glu Glu Leu Leu Arg Ala Val 
15 10 15 

Arg Val He Lys He Leu Tyr Gin Ser Ser Tyr Pro Asn Ser Lys Gly 
20 25 30 

Thr Arg Gin Ala Arg Arg Asn Arg Arg Arg Arg Trp Arg Ala Arg Gin 
35 40 45 

Arg Gin He Arg Ala He Ser Glu Arg He Leu Ser Ser Cys Leu Gly 
50 55 60 

Gly Pro Pro Glu Pro Val Asp Leu Pro Leu Pro Pro Leu Asp Arg Leu 
65 70 75 80 

Thr Leu Asp Thr Glu Glu Asp Ser Gly Thr Pro Gly Thr Glu Ser Gin 
85 90 95 

Gin Gly Thr Ala Thr Thr Glu * Thr Gin Asn Thr Leu Val Gly Asn 
100 105 110 

Thr Cys He Leu Gly Lys Arg Val Lys Gly 
115 120 

(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 2559 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : simple 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: ADN (genomic) 



(ix) FEATURE: 

(A) NAME/ KEY: CDS 

(B) LOCATION : 1 . .2556 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

ATG AAA GTG ATG GGG ATG CAG AGT GGT TGG ATG GGG ATG AAG AGT GGT 
Met Lys Val Met Gly Met Gin Ser Gly Trp Met Gly Met Lys Ser Gly 
125 130 135 
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TGG TTA CTC TTC TAT CTT CTA GTA AGC TTG ATC AAG GTA ATT GGG TCT 96 

Trp Leu Leu Phe Tyr Leu Leu Val Ser Leu lie Lys Val He Gly Ser 
140 145 150 

GAA CAA CAT TGG GTA ACA GTG TAC TAT GGG GTA CCA GTA TGG AGA GAA 144 

Glu Gin His Trp Val Thr Val Tyr Tyr Gly Val Pro Val Trp Arg Glu 

155 160 165 170 

GCA GAG ACA ACT CTT TTC TGT GCT TCA GAT GCT AAA GCC CAT AGT ACA 192 

Ala Glu Thr Thr Leu Phe Cys Ala Ser Asp Ala Lys Ala His Ser Thr 
175 180 185 



GAG GCT CAC AAC ATC TGG GCC ACA CAA GCA TGT GTT CCT ACT GAT CCC 240 
Glu Ala His Asn He Trp Ala Thr Gin Ala Cys Val Pro Thr Asp Pro 
190 195 200 

AAT CCA CAA GAA GTG CTA TTA CCC AAT GTA ACT GAA AAA TTT AAT ATG 2 88 
Asn Pro Gin Glu Val Leu Leu Pro Asn Val Thr Glu Lys Phe Asn Met 
205 210 215 



TGG GAA AAT AAA ATG GCA GAC CAA ATG CAA GAG GAT ATT ATC AGT CTG 3 36 
Trp Glu Asn Lys Met Ala Asp Gin Met Gin Glu Asp He He Ser Leu 
220 225 230 



TGG GAA CAG AGC TTA AAG CCC TGT 
Trp Glu Gin Ser Leu Lys Pro Cys 
235 240 

ACT ATG CTT TGT AAC GAT AGC TAT 
Thr Met Leu Cys Asn Asp Ser Tyr 
255 



GTT AAA TTA ACC CCA TTA TGT GTA 3 84 
Val Lys Leu Thr Pro Leu Cys Val 
245 250 

GGG GAG GAA AGG AAC AAT ACA AAT 432 
Gly Glu Glu Arg Asn Asn Thr Asn 
260 265 



ATG ACA ACA AGA GAA CCA GAC ATA GGA TAC AAA CAA ATG AAA AAT TGC 4 80 
Met Thr Thr Arg Glu Pro Asp He Gly Tyr Lys Gin Met Lys Asn Cys 
270 275 280 



TCA TTC AAT GCA ACC ACT GAG CTA ACA GAT AAA AAG AAG CAA GTT TAC 52 8 

Ser Phe Asn Ala Thr Thr Glu Leu Thr Asp Lys Lys Lys Gin Val Tyr 
285 290 295 

TCT CTG TTT TAT GTA GAA GAT GTA GTA CCA ATC AAT GCC TAT AAT AAA 576 

Ser Leu Phe Tyr Val Glu Asp Val yal Pro He Asn Ala Tyr Asn Lys 
300 305 310 

ACA TAT AGG CTA ATA AAT TGT AAT ACC ACA GCT GTG ACA CAA GCT TGT 624 

Thr Tyr Arg Leu He Asn Cys Asn Thr Thr Ala Val Thr Gin Ala Cys 

315 320 325 330 

CCT AAG ACT TCC TTT GAG CCA ATT CCA ATA CAT TAC TGT GCA CCA CCA 672 

Pro Lys Thr Ser Phe Glu Pro He Pro He His Tyr Cys Ala Pro Pro 
335 340 345 



GGC TTT GCC ATT ATG AAA TGT AAT GAA GGA AAC TTT AGT GGA AAT GGA 
Gly Phe Ala He Met Lys Cys Asn Glu Gly Asn Phe Ser Gly Asn Gly 
350 355 360 



720 



52 



AGC TGT ACA AAT GTG AGT ACT GTA CAA TGC ACA CAT GGA ATA AAG CCA 768 
Ser Cys Thr Asn Val Ser Thr Val Gin Cys Thr His Gly He Lys Pro 
365 370 375 

GTG ATA TCC ACT CAG TTA ATC CTA AAT GGA AGC TTA AAT ACA GAT GGA 816 
Val He Ser Thr Gin Leu He Leu Asn Gly Ser Leu Asn Thr Asp Gly 
380 385 390 

ATT GTT ATT AGA AAT GAT AGT CAC AGT AAT CTG TTG GTG CAA TGG AAT 864 
He Val He Arg Asn Asp Ser His Ser Asn Leu Leu Val Gin Trp Asn 
395 400 405 410 

GAG ACA GTG CCA ATA AAT TGT ACA AGG CCA GGA AAT AAT ACA GGA GGA 912 
Glu Thr Val Pro He Asn Cys Thr Arg Pro Gly Asn Asn Thr Gly Gly 
415 420 425- 

CAG GTG CAG ATA GGA CCT GCT ATG ACA TTT TAT AAC ATA GAA AAA ATA 960 
Gin Val Gin He Gly Pro Ala Met Thr Phe Tyr Asn He Glu Lys He 
430 435 440 

GTA GGA GAC ATT AGA CAA GCA TAC TGT AAT GTC TCT AAA GAA CTA TGG 1008 
Val Gly Asp He Arg Gin Ala Tyr Cys Asn Val Ser Lys Glu Leu Trp 
445 450 455 

GAA CCA ATG TGG AAT AGA ACA AGA GAG GAA ATA AAG AAA ATC CTG GGG 1056 
Glu Pro Met Trp Asn Arg Thr Arg Glu Glu He Lys Lys He Leu Gly 
460 465 470 

AAA AAC AAC ATA ACC TTC AGG GCT CGA GAG AGG AAT GAA GGA GAC CTA 1104 
Lys Asn Asn He Thr Phe Arg Ala Arg Glu Arg Asn Glu Gly Asp Leu 
475 480 485 490 

GAA GTG ACA CAC TTA ATG TTC AAT TGT AGA GGA GAG TTT TTC TAT TGT 1152 
Glu Val Thr His Leu Met Phe Asn Cys Arg Gly Glu Phe Phe Tyr Cys 
495 500 505 

AAC ACT TCC AAA TTA TTT AAT GAG GAA TTA CTT AAC GAG ACA GGT GAG 12 00 
Asn Thr Ser Lys Leu Phe Asn Glu Glu Leu Leu Asn Glu Thr Gly Glu 
510 515 520 

CCT ATT ACT CTG CCT TGT AGA ATA AGA CAG ATT GTA AAT TTG TGG ACA 1248 
Pro He Thr Leu Pro Cys Arg He Arg Gin lie Val Asn Leu Trp Thr 
525 530 535 

AGG GTA GGA AAA GGA ATT TAT GCA CCA CCA ATT CGG GGA GTT CTT AAC 12 96 
Arg Val Gly Lys Gly He Tyr Ala Pro Pro He Arg Gly Val Leu Asn 
540 545 550 

TGT ACC TCC AAT ATT ACT GGA CTG GTT CTA GAA TAT AGT GGT GGG CCT 1344 
Cys Thr Ser Asn He Thr Gly Leu Val Leu Glu Tyr Ser Gly Gly Pro 
555 560 565 " 570 

GAC ACC AAG GAA ACA ATA GTA TAT CCC TCA GGA GGA AAC ATG GTT AAT 1392 
Asp Thr Lys Glu Thr lie Val Tyr Pro Ser Gly Gly Asn Met Val Asn 
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575 



580 



585 



CTC TGG AGA CAA GAG TTG TAT AAG TAC AAA GTA GTT AGC ATA GAA CCC 
Leu Trp Arg Gin Glu Leu Tyr Lys Tyr Lys Val Val Ser He Glu Pro 
590 595 600 



1440 



ATA GGA GTA GCA CCA GGT AAA GCT AAA AGA CGC ACA GTG AGT AGA GAA 
He Gly Val Ala Pro Gly Lys Ala Lys Arg Arg Thr Val Ser Arg Glu 
605 610 615 



1488 



AAA AGA GCA GCC TTT GGA CTA GGT GCG CTG TTT CTT GGG TTT CTT GGA 
Lys Arg Ala Ala Phe Gly Leu Gly Ala Leu Phe Leu Gly Phe Leu Gly 
620 625 630 



1536 



GCA GCA GGG AGC ACT ATG GGC GCA GCG TCA ATA ACG CTG ACG GTA CAG 
Ala Ala Gly Ser Thr Met Gly Ala Ala Ser He Thr Leu Thr Val Gin 
635 640 645 650 



1584 



GCC CGG ACA TTA TTA TCT GGG ATA GTG CAA CAG CAG AAT ATT CTG TTG 
Ala Arg Thr Leu Leu Ser Gly He Val Gin Gin Gin Asn He Leu Leu 
655 660 665 



1632 



AGA GCA ATA 
Arg Ala He 



GAG GCG CAA CAA CAT TTG TTG CAA CTC TCA ATC TGG GGC 
Glu Ala Gin Gin His Leu Leu Gin Leu Ser He Trp Gly 
670 675 680 



1680 



ATT AAA CAG 
lie Lys Gin 
685 



CTC CAG GCA AAA GTC CTT GCT ATA GAA AGA TAC CTT AGG 
Leu Gin Ala Lys Val Leu Ala lie Glu Arg Tyr Leu Arg 
690 695 



1728 



GAT CAG CAA 
Asp Gin Gin 
700 



ATC CTA AGT CTA TGG GGC TGC TCA GGA AAA ACA ATA TGC 
He Leu Ser Leu Trp Gly Cys Ser Gly Lys Thr He Cys 
705 710 



1776 



TAT ACC ACT 
Tyr Thr Thr 
715 



GTG CCT TGG AAT GAG ACT TGG AGC AAC AAT ACC TCT TAT 
Val Pro Trp Asn Glu Thr Trp Ser Asn Asn Thr Ser Tyr 
720 725 730 



1824 



GAT ACA ATC 
Asp Thr He 



TGG AAT AAT TTA ACC TGG CAA CAA TGG GAT GAG AAA GTA 
Trp Asn Asn Leu Thr Trp Gin Gin Trp Asp Glu Lys Val 
735 740 745 



1872 



AGA AAC TAT 
Arg Asn Tyr 



TCA GGT GTC ATT TTT GGA CTT ATA GAA CAG GCA CAA GAA 
Ser Gly Val He Phe Gly Leu lie Glu Gin Ala Gin Glu 
750 755 760 



1920 



CAA CAG AAC 
Gin Gin Asn 
765 



ACA AAT GAG AAA TCA CTC TTG GAA TTG GAT CAA TGG GAC 
Thr Asn Glu Lys Ser Leu Leu Glu Leu Asp Gin Trp Asp 
770 775 



1968 



AGT CTG TGG 
Ser Leu Trp 
780 



AGC TGG TTT GGT ATT ACA AAA TGG CTG TGG TAT ATA AAA 
Ser Trp Phe Gly lie Thr Lys Trp Leu Trp Tyr lie Lys 
785 790 



2016 



1 



ATA GCT ATA ATG ATA GTA GCA GGC ATT GTA GGC ATA AGA ATC ATA AGT 2064 
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He Ala He Met 
795 

ATA GTA ATA ACT 
He Val He Thr 



He Val Ala 
800 

ATA ATA GCA 
He He Ala 
815 



Gly He Val 



AGA GTT AGG 
Arg Val Arg 
820 



Gly He Arg 
805 

CAG GGA TAT 
Gin Gly Tyr 



He He Ser 
810 

TCT CCC CTT 
Ser Pro Leu 
825 



2112 



TCG TTG CAG ACC 
Ser Leu Gin Thr 
830 



CTT ATC CCA 
Leu He Pro 



ACA GCA AGG 
Thr Ala Arg 
835 



GGA CCA GAC 
Gly Pro Asp 



AGG CCA GAA 
Arg Pro Glu 
840 



2160 



GAA ACA GAA GGA 
Glu Thr Glu Gly 
845 



GGC GTT GGA 
Gly Val Gly 



GAG CAA GAC 
Glu Gin Asp 
850 



AGA GGC AGA 
Arg Gly Arg 
855 



TCC GTG CGA 
Ser Val Arg 



2208 



TTA GTG AGC GGA TTC TCA GCT CTT GTC TGG GAG GAC CTC CGG AAC CTG 
Leu Val Ser Gly Phe Ser Ala Leu Val Trp Glu Asp Leu Arg Asn Leu 
860 865 870 



2256 



TTG ATC TTC CTC TAC CAC CGC TTG ACA GAC TCA CTC TTG ATA CTG AGG 
Leu He Phe Leu Tyr His Arg Leu Thr Asp Ser Leu Leu He Leu Arg 
875 880 885 890 



2304 



AGG ACT CTG GAA CTC CTG GGA CAG AGT CTC AGC AGG GGA CTG CAA CTA 
Arg Thr Leu Glu Leu Leu Gly Gin Ser Leu Ser Arg Gly Leu Gin Leu 
895 900 905 



2352 



CTG AAT GAA CTC AGA ACA CAC TTG TGG GGA ATA CTT GCA TAT TGG GGA 
Leu Asn Glu Leu Arg Thr His Leu Trp Gly He Leu Ala Tyr Trp Gly 
910 915 920 



2400 



AAA GAG TTA AGG GAT AGT GCT ATC AGC TTG CTT AAT ACA ACA GCT ATT 
Lys Glu Leu Arg Asp Ser Ala He Ser Leu Leu Asn Thr Thr Ala He 
925 930 935 



2448 



GTA GTA GCA GAA GGA ACA GAT AGG ATT ATA GAA TTA GCA CAA AGA ATA 
Val Val Ala Glu Gly Thr Asp Arg He He Glu Leu Ala Gin Arg He 
940 945 950 



2496 



GGA AGG GGA ATA TTA CAC ATA CCT AGA AGA ATC AGA CAA GGC CTA GAA 
Gly Arg Gly He Leu His He Pro Arg Arg He Arg Gin Gly Leu Glu 
955 960 965 970 



2544 



AGA GCA CTG ATA TAA 
Arg Ala Leu He 



2559 



(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 852 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

Met Lys Val Met Gly Met Gin Ser Gly Trp Met Gly Met Lys Ser Gly 
1 5 10 15 

Trp Leu Leu Phe Tyr Leu Leu Val Ser Leu He Lys Val He Gly Ser 
20 25 30 

Glu Gin His Trp Val Thr Val Tyr Tyr Gly Val Pro Val Trp Arg Glu 
35 40 45 

Ala Glu Thr Thr Leu Phe Cys Ala Ser Asp Ala Lys Ala His Ser Thr 
50 55 60 

Glu Ala His Asn He Trp Ala Thr Gin Ala Cys Val Pro Thr Asp Pro 
65 70 75 80 

Asn Pro Gin Glu Val Leu Leu Pro Asn Val Thr Glu Lys Phe Asn Met 
85 90 95 

Trp Glu Asn Lys Met Ala Asp Gin Met Gin Glu Asp He He Ser Leu 
100 105 110 

Trp Glu Gin Ser Leu Lys Pro Cys Val Lys Leu Thr Pro Leu Cys Val 
115 120 125 

Thr Met Leu Cys Asn Asp Ser Tyr Gly Glu Glu Arg Asn Asn Thr Asn 
130 135 140 

Met Thr Thr Arg Glu Pro Asp He Gly Tyr Lys Gin Met Lys Asn Cys 
145 150 155 160 

Ser Phe Asn Ala Thr Thr Glu Leu Thr Asp Lys Lys Lys Gin Val Tyr 
165 170 175 

Ser Leu Phe Tyr Val Glu Asp Val Val Pro He Asn Ala Tyr Asn Lys 
180 185 190 

Thr Tyr Arg Leu He Asn Cys Asn Thr Thr Ala Val Thr Gin Ala Cys 
195 200 205 

Pro Lys Thr Ser Phe Glu Pro He Pro He His Tyr Cys Ala Pro Pro 
210 215 220 

Gly Phe Ala He Met Lys Cys Asn Glu Gly Asn Phe Ser Gly Asn Gly 
225 230 235 240 

Ser Cys Thr Asn Val Ser Thr Val Gin Cys Thr His Gly He Lys Pro 
245 250 * 255 



Val He Ser Thr Gin Leu He Leu Asn Gly Ser Leu Asn Thr Asp Gly 
260 265 270 

X)S^^^ e Val ll * Arg Asn Asp Ser His Ser Asn Leu Leu Val Gin Trp Asn 
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275 



280 



285 



Glu Thr Val Pro lie Asn Cys Thr Arg Pro Gly Asn Asn Thr Gly Gly 
290 295 300 

Gin Val Gin He Gly Pro Ala Met Thr Phe Tyr Asn He Glu Lys He 
305 310 315 320 

Val Gly Asp He Arg Gin Ala Tyr Cys Asn Val Ser Lys Glu Leu Trp 
325 330 335 

Glu Pro Met Trp Asn Arg Thr Arg Glu Glu He Lys Lys He Leu Gly 
340 345 350 

Lys Asn Asn He Thr Phe Arg Ala Arg Glu Arg Asn Glu Gly Asp Leu 
355 360 365 

Glu Val Thr His Leu Met Phe Asn Cys Arg Gly Glu Phe Phe Tyr Cys 
370 375 380 

Asn Thr Ser Lys Leu Phe Asn Glu Glu Leu Leu Asn Glu Thr Gly Glu 
385 390 395 400 

Pro He Thr Leu Pro Cys Arg He Arg Gin lie Val Asn Leu Trp Thr 
405 410 415 

Arg Val Gly Lys Gly He Tyr Ala Pro Pro He Arg Gly Val Leu Asn 
420 425 430 

Cys Thr Ser Asn He Thr Gly Leu Val Leu Glu Tyr Ser Gly Gly Pro 
435 440 445 

Asp Thr Lys Glu Thr He Val Tyr Pro Ser Gly Gly Asn Met Val Asn 
450 455 460 

Leu Trp Arg Gin Glu Leu Tyr Lys Tyr Lys Val Val Ser He Glu Pro 
465 470 475 480 

He Gly Val Ala Pro Gly Lys Ala Lys Arg Arg Thr Val Ser Arg Glu 
485 490 495 

Lys Arg Ala Ala Phe Gly Leu Gly Ala Leu Phe Leu Gly Phe Leu Gly 
500 505 510 

Ala Ala Gly Ser Thr Met Gly Ala Ala Ser He Thr Leu Thr Val Gin 
515 520 525 

Ala Arg Thr Leu Leu Ser Gly He Val Gin Gin Gin Asn He Leu Leu 
530 535 540 

Arg Ala He Glu Ala Gin Gin His Leu Leu Gin Leu Ser He Trp Gly 
545 550 555 * 560 



He Lys Gin Leu Gin Ala Lys Val Leu Ala lie Glu Arg Tyr Leu Arg 
565 570 575 
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Asp Gin Gin lie Leu Ser Leu Trp Gly Cys Ser Gly Lys Thr lie Cys 
580 585 590 

Tyr Thr Thr val Pro Trp Asn Glu Thr Trp Ser Asn Asn Thr Ser Tyr 
595 600 605 

Asp Thr He Trp Asn Asn Leu Thr Trp Gin Gin Trp Asp Glu Lys Val 
610 615 620 

Arg Asn Tyr Ser Gly Val He Phe Gly Leu lie Glu Gin Ala Gin Glu 
625 630 635 640 

Gin Gin Asn Thr Asn Glu Lys Ser Leu Leu Glu Leu Asp Gin Trp Asp 
645 650 655 

Ser Leu Trp Ser Trp Phe Gly lie Thr Lys Trp Leu Trp Tyr He Lys 



He Val He Thr He He Ala Arg Val Arg Gin Gly Tyr Ser Pro Leu 
690 695 700 

Ser Leu Gin Thr Leu He Pro Thr Ala Arg Gly Pro Asp Arg Pro Glu 
705 710 715 720 

Glu Thr Glu Gly Gly Val Gly Glu Gin Asp Arg Gly Arg Ser Val Arg 
725 730 735 

Leu Val Ser Gly Phe Ser Ala Leu Val Trp Glu Asp Leu Arg Asn Leu 
740 745 750 

Leu He Phe Leu Tyr His Arg Leu Thr Asp Ser Leu Leu He Leu Arg 
755 760 765 

Arg Thr Leu Glu Leu Leu Gly Gin Ser Leu Ser Arg Gly Leu Gin Leu 
770 775 780 

Leu Asn Glu Leu Arg Thr His Leu Trp Gly He Leu Ala Tyr Trp Gly 
785 790 795 800 

Lys Glu Leu Arg Asp Ser Ala He Ser Leu Leu Asn Thr Thr Ala He 
80S 810 815 

Val Val Ala Glu Gly Thr Asp Arg He He Glu Leu Ala Gin Arg He 
820 825 830 

Gly Arg Gly He Leu His He Pro Arg Arg He Arg Gin Gly Leu Glu 
835 840 845 

Arg Ala Leu He 



660 



665 



670 



He Ala He Met He Val Ala Gly He Val Gly He Arg He He Ser 
675 680 685 



850 
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(2) INFORMATION FOR SEQ ID NO: 19: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 639 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : simple 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: ADN (genomic) 



(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 1 . .636 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 



ATG GGA AAG ATT TGG TCA AAG AGC AGC CTA GTA GGA TGG CCA GAA ATC 4 8 

Met Gly Lys He Trp Ser Lys Ser Ser Leu Val Gly Trp Pro Glu He 
855 860 865 



AGA GAA AGA ATG AGA AGA CAA ACG CAA GAA CCA GCA GTA GAG CCA GCA 96 
Arg Glu Arg Met Arg Arg Gin Thr Gin Glu Pro Ala Val Glu Pro Ala 
870 875 880 

GTA GGA GCA GGA GCA GCT TCT CAA GAT CTA GCT AAT CGA GGG GCC ATC 144 
Val Gly Ala Gly Ala Ala Ser Gin Asp Leu Ala Asn Arg Gly Ala He 
885 890 895 900 



ACC ATA AGA AAT ACT AGA GAC AAT AAT GAA AGT ATA GCT TGG CTA GAA 192 
Thr He Arg Asn Thr Arg Asp Asn Asn Glu Ser lie Ala Trp Leu Glu 
905 910 915 



GCA CAA GAA GAA GAA GAG GAA GTA GGC TTT CCA GTA CGC CCT CAG GTA 24 0 
Ala Gin Glu Glu Glu Glu Glu Val Gly Phe Pro Val Arg Pro Gin Val 
920 925 930 



CCA TTA AGG CCA ATA ACC TAT AAA CAG GCT TTT GAT CTT TCC TTC TTT 288 
Pro Leu Arg Pro He Thr Tyr Lys Gin Ala Phe Asp Leu Ser Phe Phe 
935 940 945 



TTA AAA GAT AAG GGG GGA CTG GAA GGG CTA GTT TGG TCC AGA AAA AGG 33 6 

Leu Lys Asp Lys Gly Gly Leu Glu Gly Leu Val Trp Ser Arg Lys Arg 

950 955 960 

CAA GAT ATT CTA GAC CTC TGG ATG TAT CAC ACA CAA GGC ATC CTC CCT 384 

Gin Asp He Leu Asp Leu Trp Met Tyr His Thr Gin Gly He Leu Pro 

965 970 975 980 



GAC TGG CAT AAC TAC ACA CCA GGG CCA GGA ATT AGA TAC CCC GTA ACC 432 
Asp Trp His Asn Tyr Thr Pro Gly Pro Gly He Arg Tyr Pro Val Thr 
985 990 995 



^TT GGA TGG TGC TTC AAA CTA GTA CCA TTG TCA GCT GAA GAA GTA GAA 



480 
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Phe Gly Trp Cys Phe Lys Leu Val Pro Leu Ser Ala Glu Glu Val Glu 
1000 1005 1010 

GAG GCT AAT GAA GGA GAC AAC AAT GCC CTC TTA CAC CCC ATA TGT CAA 528 
Glu Ala Asn Glu Gly Asp Asn Asn Ala Leu Leu His Pro He Cys Gin 
1015 1020 1025 

CAT GGA GCA GAT GAT GAT CAT AAA GAA GTG TTG GTG TGG CGA TTT GAC 576 
His Gly Ala Asp Asp Asp His Lys Glu Val Leu Val Trp Arg Phe Asp 
1030 1035 1040 

AGC TCC CTA GCA AGA AGA CAT GTA GCA AGA GAG CTG CAT CCG GAG TTT 624 
Ser Ser Leu Ala Arg Arg His Val Ala Arg Glu Leu His Pro Glu Phe 
1045 1050 1055 1060 

TAC AAG AAC TGC TGA 639 
Tyr Lys Asn Cys 



(2) INFORMATION FOR SEQ ID NO : 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 212 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : protein 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 

Met Gly Lys He Trp Ser Lys Ser Ser Leu Val Gly Trp Pro Glu He 
15 10 15 

Arg Glu Arg Met Arg Arg Gin Thr Gin Glu Pro Ala Val Glu Pro Ala 
20 25 30 

Val Gly Ala Gly Ala Ala Ser Gin Asp Leu Ala Asn Arg Gly Ala He 
35 40 45 

Thr He Arg Asn Thr Arg Asp Asn Asn Glu Ser He Ala Trp Leu Glu 
50 55 60 

Ala Gin Glu Glu Glu Glu Glu Val Gly Phe Pro Val Arg Pro Gin Val 
65 70 75 80 

Pro Leu Arg Pro He Thr Tyr Lys Gin Ala Phe Asp Leu Ser Phe Phe 
85 90 95 

Leu Lys Asp Lys Gly Gly Leu Glu Gly Leu Val Trp Ser Arg Lys Arg 
100 105 110 

Gin Asp He Leu Asp Leu Trp Met Tyr His Thr Gin Gly He Leu Pro 
115 120 125 

Asp Trp His Asn Tyr Thr Pro Gly Pro Gly He Arg Tyr Pro Val Thr 
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130 



135 



140 



Phe Gly Trp Cys Phe Lys Leu Val Pro Leu Ser Ala Glu Glu Val Glu 
145 150 155 160 

Glu Ala Asn Glu Gly Asp Asn Asn Ala Leu Leu His Pro lie Cys Gin 
165 170 175 

His Gly Ala Asp Asp Asp His Lys Glu Val Leu Val Trp Arg Phe Asp 
180 185 190 

Ser Ser Leu Ala Arg Arg His Val Ala Arg Glu Leu His Pro Glu Phe 
195 200 205 

Tyr Lys Asn Cys 
210 

(2) INFORMATION FOR SEQ ID NO : 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: simple 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = ocPRIMER* 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 
ATTGCGTACT CACACTTCCG 20 
(2) INFORMATION FOR SEQ ID NO : 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: simple 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = «PRIMER» 




(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22 
GGCAAGCAGG GAGCTGG 

(2) INFORMATION FOR SEQ ID NO : 23: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: simple 



17 
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(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = «PRIMER» 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 
TCCTTGAGCA GTCTGGAC 18 
(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : simple 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = «PRIMER» 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 

GAACAGGAGG ATTAGCAG 18 

(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: simple 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = «PRIMER» 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 
AGCAGAGGCT ATGTCACA 18 
(2) INFORMATION FOR SEQ ID NO : 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: simple 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = «PRIMER» 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 
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TGTAAGGCCC CTAGAAGAG 
(2) INFORMATION FOR SEQ ID NO : 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : simple 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = «PRIMERx> 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 
ACAGAGAACT CTCTGTAC 

(2) INFORMATION FOR SEQ ID NO : 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: simple 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc - «PRIMER» 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 
AAGAAAAGCA GTTGGTAC 

(2) INFORMATION FOR SEQ ID NO : 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: simple 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = «PRIMER» 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 
TTTCTTCCCT GTATGTC 

(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 18 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS : simple 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = «PRIMER» 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3 0 

GTTATATGGA TTCTCAGG 

(2) INFORMATION FOR SEQ ID NO : 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: simple 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc * «PRIMER» 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31 

TGGCAGCACA TTATACTGG 

(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: simple 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = «PRIMER» 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32 
ATCATTTACC AGTACATGGA CGA 
(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: simple 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = «PRIMER» 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33 

SQTCAGGGGT CGTAAAGC 




2 
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(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 18 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS : simple 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : other nucleic acid 

(A) DESCRIPTION: /desc = «PRIMER» 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34 
TCCTCTGGAT GGGATATG 
(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: simple 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = «PRIMER» 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 35 
TCTATCCAGG AATCAGAG 

(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: simple 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = «PRIMER» 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36 
AATGAGATCT GCCCATAC 

(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: simple 




(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = «PRIMER» 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37 
TGACAGATAG GGGAAGAC 

(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : simple 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = «PRIMER» 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38 

AACCGCCATT TGCACTGC 

(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: simple 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = «PRIMER» 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 39 
ACATGGACCG CCACAAGG 
(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: simple 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = «PRIMER» 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40 

[CAACAGAC ATACAGAC 
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(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : simple 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = «PRIMER» 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41 

AAAGTAGTCC CACGTAGG 

(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: simple 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc - «PRIMER» 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42 
ATATCCCAGT AGGTCAGG 
(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: simple 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = «PRIMER» 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43 
TCTAGCACTA ACAGCCTG 

(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: simple 

(D) TOPOLOGY: linear 



67 



(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = «PRIMER» 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44: 
ACTCTTACTG CTCTGAGG 

(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : simple' 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = «PRIMER» 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 
CCATAGTACA CTGTTACC 18 
(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : simple 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = «PRIMER» 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 46: 
CATAGCTATC GTTACAAAGC 20 
(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: simple 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = «PRIMER» 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47: 
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TCATAATGGC AAAGCCTG 

(2) INFORMATION FOR SEQ ID NO : 48: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 18 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS : simple 
(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = «PRIMER» 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 8 
CTATTCCACA TTGGTTCC 
(2) INFORMATION FOR SEQ ID NO : 49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : simple 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = «PRIMER» 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO.: 49 
ATTCTAGAAC CAGTCCAG 
(2) INFORMATION FOR SEQ ID NO : 50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : simple 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = «PRIMER» 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50 
CCTTAGGGAT CAGCAAATCC 
(2) INFORMATION FOR SEQ ID NO : 51: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 18 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS : simple 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : other nucleic acid 

(A) DESCRIPTION: /desc - «PRIMER» 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51: 
TGGGACAGTC TGTGGAGC 18 
(2) INFORMATION FOR SEQ ID NO: 52: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: simple 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc '= «PRIMER» 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52: 

TTCTCAGCTC TTGTCTGG 18 

(2) INFORMATION FOR SEQ ID NO : 53: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: simple 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = «PRIMER» 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53: 

ATTAAGCAAG CTGATAGC 18 

(2) INFORMATION FOR SEQ ID NO: 54: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : simple 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc » «PRIMER» 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 54: 

^g^SBGTGCTTCTA GCCAAG 16 
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(2) INFORMATION FOR SEQ ID NO: 55: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: simple 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = «PRIMER» 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55: 

GCTCCATGTT GACATATG 18 

(2) INFORMATION FOR SEQ ID NO: 56: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: simple 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = «PRIMER» 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56: 

AGAGAGACCC AGTACAAG 18 

(2) INFORMATION FOR SEQ ID NO: 57: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: simple 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "PRIMER" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 57: 

ATAAAAGCAG CCGCTTCTCG 2 0 

(2) INFORMATION FOR SEQ ID NO: 58: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 35 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: simple 

(D) TOPOLOGY: linear 




(ii) MOLECULE TYPE: peptide 



71 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 58: 

Cys Thr Arg Pro Gly Asn Asn Thr Gly Gly Gin Val Gin He Gly Pro 
1 5 10 15 

Ala Met Thr Phe Tyr Asn He Glu Lys He Val Gly Asp He Arg Gin 
20 25 30 

Ala Tyr Cys 
35 

(2) INFORMATION FOR SEQ ID NO: 59: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 35 amino acids 

(B) TYPE: amino acid 

(C) STRAND EDNESS : simple 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 59: 

Cys His Arg Pro Gly Asn Asn Thr Arg Gly Glu Val Gin He Gly Pro 
15 10 15 

Gly Met Thr Phe Tyr Asn He Glu Asn Val Tyr Gly Asp Thr Arg Ser 
20 25 30 

Ala Tyr Cys 
35 

(2) INFORMATION FOR SEQ ID NO: 60: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 35 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : simple 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



SEQUENCE DESCRIPTION: SEQ ID NO: 60: 

He Arg Pro Gly Asn Arg Thr Tyr Arg Asn Leu Gin He Gly Pro 
5 10 15 

Met Thr Phe Tyr Asn Val Glu He Ala Thr Gly Asp He Arg Lys 
20 25 30 

Phe Cys 
35 



(xi) 

Cys 
l 

Gly 
Ala 
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(2) INFORMATION FOR SEQ ID NO: 61: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGHT: 35 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : simple 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61: 

Cys Thr Arg Pro Asn Asn Asn Thr Arg Lys Ser Val Arg lie Gly Pro" 
15 10 15 

Gly Gin Ala Phe Tyr Ala Thr Gly Asp He He Gly Asp He Arg Gin 



20 



25 



30 



Ala His Cys 
35 
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THE CLAIMS DEFINING THE INVENTION ARE AS FOLLOWS: 

l f An isolated non-M, non-0 strain of HIV-1, a 

sample of which was deposited on 2 July 1996 under number 
1-1753 (designated YBF30) in the Collection Nationale de 
Cultures de Microorganismes (National Collection of 
Microorganism Cultures) kept by the Pasteur Institute. 

2. An isolated nucleic acid sequence, wherein the 
sequence is derived from the strain according to Claim 1 
and is selected from the group consisting of: SEQ ID N0:1, 
SEQ ID N0:2, SEQ ID N0:3, SEQ ID N0:5, SEQ ID N0:7, SEQ ID 
N0:9, SEQ ID N0:11, SEQ ID N0:13, SEQ ID N0:15, SEQ ID 

NO: 17, SEQ ID NO: 19 and SEQ ID N0S:21 to 57, and wherein 
said sequence is capable of hybridizing with a nucleic 
acid sequence which is derived from a non-M, non-0 HIV-1 
virus . 

3 . An isolated oligonucleotide wherein said 
oligonucleotide comprises a nucleic acid sequence selected 
from the group consisting of SEQ ID N0S:21 to 57, and 
wherein said oligonucleotide is capable of being used as a 
primer or as a probe for detecting a non-M, non-0 HIV-1 
strain according to Claim 1. 

4. An isolated non-M, non-0 strain HIV-1 virus, 
wherein the virus exhibits the following characteristics: 

(a) little or no serological reactivity with 
regard to proteins of the M and 0 groups and strong 
serological reactivity with regard to proteins which are 
derived from the YBF30 strain according to Claim 1 or the 
CPZGAB SIV strain; 

(b) absence of genomic amplification when using 
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primers form the env and gag regions of the HIV-1 viruses 
of the M and O groups; 

(c) genomic amplification in the presence o£ 
the primers which are derived from the YBF3 0 strain 
according to Claim 3; and 

(d) greater than 70% sequence homology with the 
polynucleotide or polypeptide products of the envelope 
gene with regard to the corresponding polypeptide or 
polynucleotide products of the envelope gene of YBF30 
strain. 



• • 



• • • 
• • • 



• • • • 

• • • 



• • • 

• • « 



5. An oligonucleotide which comprises a nucleic acid 
sequence selected from the group consisting of; SEQ ID 
NOS: 21 to 57, and wherein said oligonucleotide is capable 

15 of being used as a primer or as a probe for detecting a 
non-M, non-0 strain of HIV-1 according to Claim 4. 

6. A method of in vitro diagnosis of non-M, non-0 
strain HIV-1 virus comprising the steps of: 

(a) providing a biological sample suspected of 
20 comprising a nucleic acid sequence to be detected; 

(b) hybridizing the nucleic acid of (a) with at 
least one nucleic acid sequence according to Claim 2 or 
Claim 3 ; and 

(c) detecting the presence of the hybridized 
25 nucleic acid sequence. 



• • 



• 

• • • 
• • 



7 . An isolated peptide capable of being expressed by 

a non-M, non-0 strain of Hiv-l virus according to Claim 1 
or Claim 4 or encoded by a nucleotide sequence according 
30 to Claim 2, wherein said peptide is capable of: 

(a) being recognized by antibodies which are 
induced by a non-M, non-0 HIV-1 virus according to Claim 1 
or Claim 4, or a variant of this virus, and which are 
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present in a biological sample which is obtained following 
an infection with a non-M, non-0 HIV-1 strain; and/or 

(b) inducing the production of anti-non-M, non- 
0 HIV-1 antibodies. 
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8. A peptide according to Claim 7, wherein said 

peptide is expressed by a nucleic acid sequence selected 
from the group consisting of: SEQ ID NO: 4. SEQ ID NO: 6, 
SEQ ID NO:8. SEQ ID NO:10, SEQ ID N0:12. SEQ ID NO : 14 , SEQ 
ID N0:16, SEQ ID NO:18, SEQ ID NO:58 and SEQ ID N0:20. 



• 



• • • « 



9. An immunogenic composition comprising one or more 
translation products of the nucleotide sequences according 
to Claim 2 and/or one of the peptides according to Claim 7 

15 or Claim 8. 

10. An isolated antibody directed against one or more 
of the peptides according to Claim 7 or Claim 8 . 

20 11. A method for the in vitro diagnosis of a non-M, 

non-0 strain of HIV-1 virus, comprising the steps of: 

(a) providing a biological sample to be tested; 

(b) combining the sample of (a) with an 
antibody according to Claim 10 [which may possibly be 

25 combined with anti-CPZGAB SIV antibodies] ; and 

(c) detecting the presence of antibody-antigen 

complexes. 



12. A method according to Claim 11, wherein the 

3 0 antibody of step (b) is also combined with anti-CPZGAB SIV 
antibodies . 




13. 



A reagent for diagnosing a non-M, non-0 strain 
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HIV~1 virus, comprising a nucleic acid or peptide sequence 
according to any one of Claims 2, 3, 7 or 8 . 
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14. a method for screening and typing a non-M, non-0 

strain HIV-1 virus, comprising the steps of: 

(a) bringing a nucleic acid sequence selected 
from the group consisting of SEQ ID NOS: 2, 3, 5, 7, 9, 
11, 13, 15, 17, 19 and 21 to 57 into contact with the 
nucleic acid of the virus to be typed; and 

(b) detecting the hybrid which is formed. 



• • • 

• • * 

• •• 

• • 

• • « 

• • • 

• • • 

• * • 

• •• • 
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15. a kit for diagnosing a non-M, non-0 strain HIV-1 

virus, comprising a reagent according to Claim 13. 

15 16. An isolated virus, according to claim 1, 

substantially as herein described with reference to any 
one of the examples or figures. 

17. An isolated virus according to claim 4, 

20 substantially as herein described with reference to any 
one of the examples or figures. 

Dated this 16th day of July 2001 

INSTITUT NATIONAL DE LA SANTE ET DE LA RECHERCHE MEDIC ALE 
25 (INSERM) and ASSISTANCE PUBLIQUE-HOPITAUX DE PARIS and 
INSTITUT PASTEUR 
By their Patent Attorneys 
GRIFFITH HACK 

Fellows Institute of Patent and 
30 Trade Mark Attorneys of Australia 
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YLG Itr ATTGCGTACTCACACTTCCG 

LPBS.1 Itr GGCAAGC A GGGAGCTGG 
GAGY Itr TCCTTGA GCAGTCTGGAC 
AS1.1 

GAGY gag GAACAGGA GGAT TAGCAG 
AS1 

Gag 6 gag AGCAGAG GCTAT GTCA CA 
GAGYS1 gag TGTAAGGCCCCTAGAAGA G 
GAGY gag ACAGAGAACTCTCTGTAC 
S1.1 

GAG Y gag AAGAAAA G CA GT TGGT A . C 
S1.2 

YRTAS pot TTTCTTC C CTGT ATGTC 
1.3 

YRTAS1.2 pot GTTATAT GGATTCTCAGG 

YRTAS1.1 pol TGGCAGCACATTATACTG G 

YRT2 pot ATCATTT A CCAGTACATG GACGA 

YRTAS1 po/TGTCAGGGGTCGTAAAGC 

YRT2-1 pol TCCTCTGGATGGGATATG 

YRT2-2 pol TCTATCCAGGAATCAGAG 

YRT-3 pol AATGAGATCTGCCCATAC 

YRT2-4 pot TGACAGATAGGGGAAGAC 

4481-1 pol AACCGCCA TTT GCACT GC 

4481-2 pol ACATGGA C CGC C ACAA GG 

4235.1 pol AGCAACA GACATACAGAC 

4235.2 vi/ AAAGTAGT CCCA CGTA GG 

4235.3 tat ATATCCCAGTAGGTCAGG 

4235.4 tat TCTAGCA C TAA CAGCCT G 
SK69.6 envACTCT. TACTGCTCTGAGG 
SK69.5 env C C A T A G T A CACT GTTACC 
SK69.4 eny CATAGCT A TCGTTACAA A GC 
SK69.3 env TCATAATGGCAAAGCCTG 
SK69.2 env CTATTCCACATTGGTTCC 
SK69.1 env ATTCTAGAACCAGTCCAG 

SK68.1 env CCTTAGGGATCAGCAAAT CC. 

SK68.2 env T G G G A C A GTCT GTGGAGC 

SK68.3 env TTCTCAGCTCTTGTCTGG 

LSIAS1.3 net ATTAAGCAAGCTGATAGC 

LSIAS1.2 neZ-TGIGCTT CTAGCCAAG 

LSI AS 1.1 /fr G C T C C A T G T T G A C A T A T G 

LSiA1 /fr AGAGAGA CCCAGTACAAG 

YLPA Itr ATAAAAGCAGCCGCTTCT CG 
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